#nixos-systemd on 2021-01-04

2020-11-24 14:32 ChanServ changed the topic of #nixos-systemd to: NixOS <3 systemd | https://jitsi.nixcon.net/systemd | Next meeting 08.12.2020 14:00 UTC (every two weeks)

00:30 pbb has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

01:34 pbb has joined #nixos-systemd

03:48 andi- has quit [Remote host closed the connection]

03:53 andi- has joined #nixos-systemd

13:15 <damjan> andi-: if systemctl is invoked, it needs to be invoked with --no-ask-password

13:16 <andi-> damjan: then we must be doing that wrong since basically forever

13:17 <damjan> afaik, systemctl always starts the ask password agent, in case a service *does* need a password

13:17 <andi-> how does that even work? Is there some service that requires passwords?

13:18 <damjan> which process exactly are you seeing?

13:18 <andi-> systemd-tty-ask-password-agent --watch

13:18 <damjan> right, the agent

13:19 <damjan> so, if a service does require a password, it'll send a request to all the agents

13:19 <andi-> Ok, so maybe my issue wasn't that the agent was running then.

13:19 <andi-> Or very likely it wasn't the issue.

13:20 <damjan> probably not

13:21 <andi-> We need a better story around restarting a list of services and honoring their dependencies. I fear it might just be another case of that.. Some service expecting something (as it is scheduled After=… or Requires=…) but systemctl restart -- <all the services> doesn't really honor dependencies.

13:22 <damjan> maybe use `isolate` to basic.target then back?

13:25 <andi-> We tried that but that would stop X11 sessions etc..

13:25 <andi-> like someone experimented with using a target for each system generation

13:25 <andi-> IRRC when you switch between them it tears down and restarts everything (even those that weren't changed)

13:35 <damjan> andi-: so you (not personally) want to restart all except the X11 session services? hmm. might be a bit hard to express that one. why not just require a reboot by the user?

13:35 <andi-> a reboot on all config changes?

13:36 <andi-> So change a whitespace in the nginx configuration -> reboot.. I would probably stop using NixOS or would never have started to use it.

13:38 <damjan> :)

13:39 <damjan> so does nix have the information about those service dependencies?

13:40 <damjan> maybe it can sprinkle drop-ins with PartOf= directives so that restarting a service restarts all that depend on it

13:41 <andi-> Well we already declare the service dependencies via the WantedBy, After, … fields

13:42 <andi-> so IMHO systemctl restart should be fixed to also honor dependencies between those services that you have given on the CLI and not just process them in order.

13:44 <andi-> We have a common issue where we request letsencrypt certificates but those units should only be started after the nss-lookup.target has been reached. If you have your dns resolver in the nss-lookup.target but restart both the resolver and the letsencrypt service the startup order is not correct as the nss-lookup.target is already reached and systemd happily start the LE unit before the

13:44 <andi-> resolver is back up.

13:44 <andi-> We could special case this but that would mean more downstream code..

13:44 <damjan> alas After/Before are not dependencies. just ordering

13:45 <damjan> now WantedBy= is one type of dependency, but PartOf=/BindsTo= another, and they are separate because there are seprate needs

13:45 <andi-> I know but in reality it makes for a good enough approximation most of the time.

13:46 <andi-> So, if I would declare everything that usually has After=network.target to (additionally?) BindTo=network.target that would be more correct?

13:46 <andi-> If that is the case I assume most/all of the units out there in the wild (not just on NixOS) are wrong.

13:47 <damjan> most services won't need BindsTo=network.target

13:47 <damjan> because they would not fail without it

13:47 <andi-> if I tear down networking the webserver doesn't have the required interfaces anymore.

13:48 <andi-> so if I stop networking the webserver *must* go down, no?

13:48 <damjan> andi-: well not neccesseary

13:48 <damjan> in the end, it's up to the system admin to decide

13:49 <andi-> ok

13:50 <andi-> Lets go back to the situation with systemctl restart <all the things>. If I have a service that is PartOf=nss-lookup.target and another service that is After=nss-lookup.target the 2nd service is allow to start before the first one is back up, right?

13:50 <andi-> Because the target has already been reached and a service in there was "just" restarted which doesn't matter.

13:51 <damjan> right

13:52 <damjan> if you need that level of control, you'd need to stop the target

13:52 <andi-> So, what I'd like to have is a way that doesn't stop the 2nd service when the first one restarts but required the target (nss-lookup) to be fully reached (as in no failed/stopped services) before it goes.

13:52 <andi-> *up

13:53 <andi-> Right now it feels like we have to script that out in your profile activation script.

13:53 <andi-> Possibly reimplementing a lot of the systemd internal code.

13:55 <damjan> seems like overengineering to me

13:55 <damjan> I mean, nss-lookup.target doesn't guarantee that the DNS server is still operational, so in that sense it's never a guarantee

13:55 <damjan> and services need to be prepared to work with that

13:56 <andi-> If there is no guarantees why do we even bother with stuff like Type=notify etc..? if we could just tell everyone to retry until it works.

13:57 <andi-> It seems like we have the assurance of best effort & maybe working which doesn't make that much of a difference

13:59 <andi-> Also I am not saying systemd does it wrong right now. I am trying to find a solution that doesn't just work for us but is a sane way to deal with this without rewriting all the software.

14:00 <andi-> I actually tried to fix the letsencrypt issue upstream but apparently they would rather not do it that way as "DNS just works" or whatever

14:44 elvishjerricco has quit [Ping timeout: 260 seconds]

14:45 globin_ has quit [Quit: o/]

14:45 globin has joined #nixos-systemd

14:45 ckauhaus has joined #nixos-systemd

14:45 globin has joined #nixos-systemd

14:45 globin has quit [Changing host]

14:46 elvishjerricco has joined #nixos-systemd

14:46 Emantor has quit [Ping timeout: 260 seconds]

14:48 Emantor_ has joined #nixos-systemd

15:28 <flokli> andi-: the whole discussion here is basically again about "how to fix our activation script"

15:29 <flokli> something I'd like to have a discussion with aanderse too ;-)

15:29 <aanderse> oh boy

15:29 <aanderse> when is that happening? right?

15:29 <flokli> yeah, that's another good question ;-)

15:38 <andi-> just have it in here whenever

19:58 <arianvp> question

19:58 <arianvp> nvm

21:13 ckauhaus has quit [Quit: WeeChat 2.7.1]