ChanServ changed the topic of #nixos-systemd to: NixOS <3 systemd | https://jitsi.nixcon.net/systemd | Next meeting 08.12.2020 14:00 UTC (every two weeks)
Emantor has quit [Quit: ZNC - http://znc.in]
Emantor has joined #nixos-systemd
lukegb has quit [Quit: ~~lukegb out~~]
lukegb has joined #nixos-systemd
<Mic92> hexa-: I had the same issue
<Mic92> on my server
<Mic92> I had to do echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger
Mic92 has quit [Quit: WeeChat 3.0]
Mic92 has joined #nixos-systemd
<hexa-> Ya same
emily has quit [Ping timeout: 260 seconds]
emily has joined #nixos-systemd
pbb has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
pbb has joined #nixos-systemd
pbb has quit [Excess Flood]
pbb has joined #nixos-systemd
<arianvp> YAY
<arianvp> I think IO found a way to reliably fix the activation script
<arianvp> andi-:
<arianvp> sorry; aanderse
<aanderse> arianvp: hey what's up?
<arianvp> Do you know what the rationale was for activation script doing `systemctl stop` instead of `systemctl restart`?
<arianvp> ah it was for the scenario where "ExecStop" should be called on the old unit; and "ExecStart" on the new right? I wonder how often this is an issue in practise. At least it shouldn't be the _default_ behaviour as most services dont have an ExecStop
<arianvp> anyhow. The "fix" for not having racey startups is to not just start "letsencrypt.service" and "unbound.service"
<arianvp> but to stop them both; and then simply start "multi-user.target" (Or whatever they both have in wantedBy)
<arianvp> it then gets started up whilst pulling in all the dependencies... instead of having both start up in a different transaction
<arianvp> this even works with inter-dependencies with timers and services. Say a timer changed then we `systemctl stop blah.timer` and `systemctl stop letsencrypt.service` `blah.timer` has wantedBy = timers.target and `letsencrypt.service` has wantedBy = multi-user.target; their common ancestor is `basic.target` which pulls both in; so we can `systemctl start basic.target` to bring them both up
<arianvp> this works every time; and systemd does the right thing. We just need to answer the question "What is the common ancestor in the dependency tree"
<arianvp> I think if we make the target units derivations as well (instead of fixed unit file values in the systemd package); and `wantedBy = ["multi-user.target"]` changes the hash of the multi-user.target derivation; then nix will automatically detect which targets to start automatically
<arianvp> and then.... everything should be awesome?
<arianvp> I'm writing a little python program to experiment with this; but the idea sounds solid in my head. It would need some changes in the NixOS module for systemd so that we are sure that a target unit gets tainted when a new dependency is added to it
<arianvp> and with tainted I mean " The derivation generating blah.target is different between the two generations"
<arianvp> </braindump>
<lukegb> arianvp: does that fix the race condition where we try to renew ACME certs before DNS is available
<lukegb> I just love "2020/12/26 04:58:52 Could not create client: get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org: no such host"
<andi-> re dlopen: I'd like to introduce this instead of relying on the ELF header things lennard wants to add to the code base as that will handle it in one place (the call site) and not require us setting rpaths, linking stuff to /run/current-system/… etc.. Which also ensures nix-shell -p systemd --run 'journalctl --grep=…' and others will work on non-nixos systems:
<andi-> opinions?
<damjan> andi-: the name is lennart :)
<andi-> ok
<arianvp> lukegb:that's the idea
<arianvp> andi-: what is the problem with patchelf'ing rpaths?
<andi-> arianvp: why should we bother if we can be explicit?
<arianvp> sure.
<andi-> If we patchelf we do not keep track of the need for it
<andi-> We will likely just have an always growing list of libs
<arianvp> btw; currnetly some of the binaries statically link against libsystemd-shared instead of dynamically; but that can be disabled in the meson_options
<arianvp> if we disable that; the dlopen's only happen in libsystemd-shared.so
<andi-> for now
<arianvp> sure... =)
<andi-> until it breaks again with out subpar testing story
<arianvp> yeh we might as well just patch the binaries to be sure
<arianvp> if it's the same code
<andi-> Thus I prefer patching the code as the references will end up in the right binaries/outputs (if we have those again)
<andi-> No having to read through their build system and guessing what might happen.
<arianvp> Ok. convinced
<arianvp> btw; this sounds useful as a generic technique outside of systemd. for other apps that do dlopen too
<andi-> that isn't new
<andi-> others have done it in nixpkgs already
<arianvp> ah. but there isn't some generic utility for it ?
<andi-> nope
<arianvp> anyhow; your idea sounds good to me. I like it
<andi-> at first I was reimplementing findInputs and using the correct build time deps and was thinking of doing it patchelf style but that is too implcit in this case IMO
<arianvp> hmm but can't we get rid of the dlopen call altogether?
<arianvp> ah wait that's hard patching the binary
<andi-> great how the misc test still fails after months and is uspposed to test the experimental nix path output..
<arianvp> I see
<andi-> arianvp: if the ELF hack that lennart is proposing ever gains traction and isn't just systemd specific we could probably add some mechanism to stdenv to deal with those.. As long as that is just some custom hack I do not feel very confident..
<arianvp> what elf-hack? you got a link?
<arianvp> ah found it by scrolling up
<andi-> things like this: https://github.com/systemd/systemd/issues/18078 should be a very early compile error and not just surface later on. Relying on systemd to keep a hack that they aren't using/testing working is not great..
<{^_^}> systemd/systemd#18078 (by flokli, 1 day ago, open): "optional" libidn2 in systemd-resolved breaks it entirely
<andi-> and yeah it being optional while not really optional is interesting..
<flokli> yeah
<arianvp> just sounds like that PR that introduced it shouldn't have been merged
<arianvp> (the libidn dlopen one)
<flokli> I hope some of this stuff is getting reverted, and people realize you should use mesa flags to build more minimal versions
<andi-> but there can only be one /usr/bin/systemctl!
<flokli> on the magic ELF section: some fedora people expressed interest to make this a generic thing to discover "optional" dependencies
<arianvp> maybe we shpuld fork; lol
<andi-> How is fedora packaging handling it?
<arianvp> andi-: it's broken for fedora I assume
<andi-> I guess they just didn't care?
<andi-> and they probably haven't updated yet
<arianvp> as they do the same trick as nixos to find runtime dependencies
<flokli> currently they synthesize some of the runtime deps from the NEEDED sections
<flokli> and cause more stuff ends up in there than they want
<flokli> all this dlopen() business was introduced
<flokli> now they want parts of it back
<flokli> I don't know
<andi-> I am also bisecting some weird systemd interaction with libseccomp in case of the nixos babel test case. Apparently the seccomp stuff breaks on the minor bump but no other test failed..
<arianvp> huh no? RPM parses elf binaries as well afaik
<flokli> I asked for some reasoning on the dlopen() story in the issue tracker, maybe there's some writeup soemwhere
<andi-> Given their history I doubt that they'll roll it back entirely. Please proof me wrong.
<andi-> looks like the dlopen patch is working \o/ https://hydra.h4ck.space/eval/1816
<Emantor> andi-: \o/
<hexa-> okay, something is really off with systemd on master
<hexa-> second machine where after a while of uptime I cannot run `systemctl status` anymore due to timeouts
<hexa-> ssh logins take forever
<hexa-> this time it's not a server but my desktop, who I've left alone over the holidays
<hexa-> ❯ systemctl status
<hexa-> Failed to read server status: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
<andi-> ugh
<andi-> I'll update my desktop in a few minutes and see about that
<hexa-> i also don't get a tty anymore
<hexa-> i can switch between them, but no login prompt appears on any of them
<hexa-> know your sysrq-foo :)
<hexa-> echo "1" > /proc/sys/kernel/sysrq
<hexa-> echo "b" > /proc/sysrq-trigger
<hexa-> may sync disks first :D
<andi-> disks, state, ... pff
<andi-> I updated my laptop yesterday but rebooted after the update
<hexa-> yeah, just don't keep state. it's such a hassle.
<andi-> activating the config failed because it can't reload some units: reloading the following units: dbus.service, dev-hugepages.mount, dev-mqueue.mount, firewall.service, proc-sys-fs-binfmt_misc.mount, sys-kernel-debug.mount, tmp.mount
<andi-> proc-sys-fs-binfmt_misc.mount is not active, cannot reload.
<andi-> probably unrelated
<lukegb> hexa-: REISUB! reboot even if system utterly broken, or whichever other mnemonic you like :P
<lukegb> busier backwards, etc.\
<hexa-> yeah, I know it exists, couldn't be arsed to look it up