#nixos-systemd on 2020-12-26

2020-11-24 14:32 ChanServ changed the topic of #nixos-systemd to: NixOS <3 systemd | https://jitsi.nixcon.net/systemd | Next meeting 08.12.2020 14:00 UTC (every two weeks)

02:20 Emantor has quit [Quit: ZNC - http://znc.in]

02:21 Emantor has joined #nixos-systemd

04:56 lukegb has quit [Quit: ~~lukegb out~~]

04:57 lukegb has joined #nixos-systemd

05:43 <Mic92> hexa-: I had the same issue

05:43 <Mic92> on my server

05:44 <Mic92> I had to do echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger

06:53 Mic92 has quit [Quit: WeeChat 3.0]

06:54 Mic92 has joined #nixos-systemd

09:10 <hexa-> Ya same

10:39 emily has quit [Ping timeout: 260 seconds]

10:52 emily has joined #nixos-systemd

12:43 pbb has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

12:43 pbb has joined #nixos-systemd

12:46 pbb has quit [Excess Flood]

12:46 pbb has joined #nixos-systemd

13:49 <arianvp> YAY

13:49 <arianvp> I think IO found a way to reliably fix the activation script

13:49 <arianvp> andi-:

13:49 <arianvp> sorry; aanderse

14:42 <aanderse> arianvp: hey what's up?

14:43 <arianvp> Do you know what the rationale was for activation script doing `systemctl stop` instead of `systemctl restart`?

14:44 <arianvp> ah it was for the scenario where "ExecStop" should be called on the old unit; and "ExecStart" on the new right? I wonder how often this is an issue in practise. At least it shouldn't be the _default_ behaviour as most services dont have an ExecStop

14:45 <arianvp> anyhow. The "fix" for not having racey startups is to not just start "letsencrypt.service" and "unbound.service"

14:45 <arianvp> but to stop them both; and then simply start "multi-user.target" (Or whatever they both have in wantedBy)

14:46 <arianvp> it then gets started up whilst pulling in all the dependencies... instead of having both start up in a different transaction

14:48 <arianvp> this even works with inter-dependencies with timers and services. Say a timer changed then we `systemctl stop blah.timer` and `systemctl stop letsencrypt.service` `blah.timer` has wantedBy = timers.target and `letsencrypt.service` has wantedBy = multi-user.target; their common ancestor is `basic.target` which pulls both in; so we can `systemctl start basic.target` to bring them both up

14:49 <arianvp> this works every time; and systemd does the right thing. We just need to answer the question "What is the common ancestor in the dependency tree"

14:50 <arianvp> I think if we make the target units derivations as well (instead of fixed unit file values in the systemd package); and `wantedBy = ["multi-user.target"]` changes the hash of the multi-user.target derivation; then nix will automatically detect which targets to start automatically

14:50 <arianvp> and then.... everything should be awesome?

14:50 <arianvp> I'm writing a little python program to experiment with this; but the idea sounds solid in my head. It would need some changes in the NixOS module for systemd so that we are sure that a target unit gets tainted when a new dependency is added to it

14:51 <arianvp> and with tainted I mean " The derivation generating blah.target is different between the two generations"

14:51 <arianvp> </braindump>

15:06 * aanderse sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/WceiSaHznWAhffIjtfTbvqSj/message.txt >

15:33 <lukegb> arianvp: does that fix the race condition where we try to renew ACME certs before DNS is available

15:33 <lukegb> I just love "2020/12/26 04:58:52 Could not create client: get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org: no such host"

16:11 <andi-> re dlopen: I'd like to introduce this instead of relying on the ELF header things lennard wants to add to the code base as that will handle it in one place (the call site) and not require us setting rpaths, linking stuff to /run/current-system/… etc.. Which also ensures nix-shell -p systemd --run 'journalctl --grep=…' and others will work on non-nixos systems:

16:11 <andi-> https://github.com/andir/nixpkgs/commit/247c235336a48a0fa383b6253655ed1fd7023d4a

16:11 <andi-> opinions?

16:36 <damjan> andi-: the name is lennart :)

16:36 <andi-> ok

16:38 <arianvp> lukegb:that's the idea

16:39 <arianvp> andi-: what is the problem with patchelf'ing rpaths?

16:40 <andi-> arianvp: why should we bother if we can be explicit?

16:40 <arianvp> sure.

16:40 <andi-> If we patchelf we do not keep track of the need for it

16:40 <andi-> We will likely just have an always growing list of libs

16:40 <arianvp> btw; currnetly some of the binaries statically link against libsystemd-shared instead of dynamically; but that can be disabled in the meson_options

16:40 <arianvp> if we disable that; the dlopen's only happen in libsystemd-shared.so

16:41 <andi-> for now

16:41 <arianvp> sure... =)

16:41 <andi-> until it breaks again with out subpar testing story

16:41 <arianvp> yeh we might as well just patch the binaries to be sure

16:41 <arianvp> if it's the same code

16:41 <andi-> Thus I prefer patching the code as the references will end up in the right binaries/outputs (if we have those again)

16:42 <andi-> No having to read through their build system and guessing what might happen.

16:42 <arianvp> Ok. convinced

16:42 <arianvp> btw; this sounds useful as a generic technique outside of systemd. for other apps that do dlopen too

16:42 <andi-> that isn't new

16:42 <andi-> others have done it in nixpkgs already

16:42 <arianvp> ah. but there isn't some generic utility for it ?

16:42 <andi-> nope

16:43 <arianvp> anyhow; your idea sounds good to me. I like it

16:43 <andi-> at first I was reimplementing findInputs and using the correct build time deps and was thinking of doing it patchelf style but that is too implcit in this case IMO

16:44 <arianvp> hmm but can't we get rid of the dlopen call altogether?

16:44 <arianvp> ah wait that's hard patching the binary

16:44 <andi-> great how the misc test still fails after months and is uspposed to test the experimental nix path output..

16:44 <arianvp> I see

16:50 <andi-> arianvp: if the ELF hack that lennart is proposing ever gains traction and isn't just systemd specific we could probably add some mechanism to stdenv to deal with those.. As long as that is just some custom hack I do not feel very confident..

16:50 <arianvp> what elf-hack? you got a link?

16:51 <arianvp> ah found it by scrolling up

16:52 <andi-> things like this: https://github.com/systemd/systemd/issues/18078 should be a very early compile error and not just surface later on. Relying on systemd to keep a hack that they aren't using/testing working is not great..

16:52 <{^_^}> systemd/systemd#18078 (by flokli, 1 day ago, open): "optional" libidn2 in systemd-resolved breaks it entirely

16:53 <andi-> and yeah it being optional while not really optional is interesting..

16:54 <flokli> yeah

16:55 <arianvp> just sounds like that PR that introduced it shouldn't have been merged

16:55 <arianvp> (the libidn dlopen one)

16:55 <flokli> I hope some of this stuff is getting reverted, and people realize you should use mesa flags to build more minimal versions

16:55 <andi-> but there can only be one /usr/bin/systemctl!

16:56 <flokli> on the magic ELF section: some fedora people expressed interest to make this a generic thing to discover "optional" dependencies

16:56 <arianvp> maybe we shpuld fork; lol

16:56 <andi-> How is fedora packaging handling it?

16:56 <arianvp> andi-: it's broken for fedora I assume

16:56 <andi-> I guess they just didn't care?

16:56 <andi-> and they probably haven't updated yet

16:56 <arianvp> as they do the same trick as nixos to find runtime dependencies

16:56 <flokli> currently they synthesize some of the runtime deps from the NEEDED sections

16:56 <flokli> and cause more stuff ends up in there than they want

16:56 <flokli> all this dlopen() business was introduced

16:56 <flokli> now they want parts of it back

16:56 <flokli> I don't know

16:57 <andi-> I am also bisecting some weird systemd interaction with libseccomp in case of the nixos babel test case. Apparently the seccomp stuff breaks on the minor bump but no other test failed..

16:57 <arianvp> huh no? RPM parses elf binaries as well afaik

16:57 <flokli> I asked for some reasoning on the dlopen() story in the issue tracker, maybe there's some writeup soemwhere

16:59 <andi-> Given their history I doubt that they'll roll it back entirely. Please proof me wrong.

17:46 <andi-> looks like the dlopen patch is working \o/ https://hydra.h4ck.space/eval/1816

18:05 <Emantor> andi-: \o/

18:06 <hexa-> okay, something is really off with systemd on master

18:06 <hexa-> second machine where after a while of uptime I cannot run `systemctl status` anymore due to timeouts

18:06 <hexa-> ssh logins take forever

18:07 <hexa-> this time it's not a server but my desktop, who I've left alone over the holidays

18:07 <hexa-> ❯ systemctl status

18:07 <hexa-> Failed to read server status: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

18:07 <andi-> ugh

18:07 <andi-> I'll update my desktop in a few minutes and see about that

18:07 <hexa-> i also don't get a tty anymore

18:07 <hexa-> i can switch between them, but no login prompt appears on any of them

18:08 <hexa-> know your sysrq-foo :)

18:08 <hexa-> echo "1" > /proc/sys/kernel/sysrq

18:09 <hexa-> echo "b" > /proc/sysrq-trigger

18:09 <hexa-> may sync disks first :D

18:09 <andi-> disks, state, ... pff

18:09 <andi-> I updated my laptop yesterday but rebooted after the update

18:10 <hexa-> yeah, just don't keep state. it's such a hassle.

18:29 <andi-> activating the config failed because it can't reload some units: reloading the following units: dbus.service, dev-hugepages.mount, dev-mqueue.mount, firewall.service, proc-sys-fs-binfmt_misc.mount, sys-kernel-debug.mount, tmp.mount

18:29 <andi-> proc-sys-fs-binfmt_misc.mount is not active, cannot reload.

18:29 <andi-> probably unrelated

23:23 <lukegb> hexa-: REISUB! reboot even if system utterly broken, or whichever other mnemonic you like :P

23:23 <lukegb> busier backwards, etc.\

23:27 <hexa-> yeah, I know it exists, couldn't be arsed to look it up