<clever>
andi-: nixops will also do send-keys for you, after deploy, start, and reboot
<clever>
the main use for manual send-keys is when nixops isnt aware of a remote reboot and loss of key material
<gchristensen>
andi-: I do
<andi->
gchristensen, clever: mostly intersted in your experienc ewith it.. I am looking at it in the context of the systemd bump and dropping a patch that causes a bit of pain for the generic systemd services. I build a libvirt test environment and whenever I want to unlock the disk it takes >30s because systemd_pam runs into a udev timeout. This happens to me on 19.03 and unstable..
<andi->
Is that also the case on your machines?
<andi->
s/udev/dbus/
<gchristensen>
ouch
<gchristensen>
I can't think of anything like that happening, which makes me think it doesn't happen
<andi->
that is probably a completly different issue
<clever>
May 27 03:18:45 nas systemd[1]: sys-subsystem-net-devices-enp3s0.device: Job sys-subsystem-net-devices-enp3s0.device/start failed with result 'timeout'.
<andi->
the note was removed but the defect is still there :/
<gchristensen>
oooOOooo
<andi->
It disappears if you do a `nixos-rebuild switch` but not if you boot the machine.
<gchristensen>
ack.
<andi->
Now I am a point - after a few weeks of dealing with things like these - that I would go for breaking `nixops send-keys` and requiring the `_netdev` option to be set. It fixes issues for everyone (racy ones) and only requires a few additional lines for a few people.
<gchristensen>
okay I'm not against that
<gchristensen>
what is the breakage users will see?
<andi->
that is the hard part... I had a bit of a heated debate about it with flokli an hour ago.. It breaks local-fs.target if _netdev isn't set for those mount points. I am not for that breakage but he convinced me that at some point we might have to do it.. There si no way we can traverse all the layers of indirections to automagically add the option :/
<gchristensen>
right
<andi->
We could probably add safeguards to nixops to warn if no FS with the option was found etc.. Still not perfect.
<gchristensen>
that would warn almost every user, right?
<andi->
probably
<gchristensen>
oof
<andi->
every user that uses autoluks that is
<gchristensen>
is there any way we can start warning users now?
<gchristensen>
warning in such a way that we know 100% they're using this feature
<samueldr>
and backport said runtime warning to at least current stable
<andi->
I'd combine the nixpkgs change with a change to nixops that fails (unless opted out?) when we see autoLuks being used without _netdev. Also a big fat warning in the release notes of 19.09... In general I'd like to find a better way to handle this. I'll spent most of the night/tomorrow reading through systemd docs one more time..
<gchristensen>
I think having an extremely well targeted alert would be a great start
<andi->
A way to mark any .mount unit as "_netdev" by only knowing the .device unit is something we need.
<andi->
Yes, definitly.
<gchristensen>
though: what happens if they miss it?
<gchristensen>
what is the risk
<andi->
Rollback, open an issue, pop up on IRC (screaming at us) and we point them to the alert we tried to get out?
<flokli>
Well, rolling back needs to be done manually. They might not be able to ssh to machines, as sshd is waiting for local-fa.target
<andi->
it does fail quickly (a few minutes)
<andi->
but then it is stuck in a resuce shell
<flokli>
So no ssh
<flokli>
I also thought about having some deployment option that needs to be set, and cancelling deployment otherwise. But UX is pretty bad
<gchristensen>
that would be very ugly on AWS where there is no console
<flokli>
The longer-run solution would be to move the autoluks part to crypttab and add somewhat transitive propagation of the _netdev attribute from the crypt device to mointpoints natively into systemd... But that's a very long stretch
<andi->
(year+)
<gchristensen>
I feel very uncomfortable with the possibility of losing data
<gchristensen>
hmm I guess it wouldn't be lost, but the recoery would be extremely annoying
<flokli>
Well, you manually need to boot an older generation, fix your nixops config, then redeploy
<gchristensen>
right, but that isn't really possible on AWS / other hosts without console access
<andi->
(once you figured what went wrong)
<flokli>
So maybe enforcing some new deployment. option to be set, to kinda ensure the nixops user read the chanhelog and updated his fstab config?
<flokli>
It's ugly, but better than the possible breakage?
<andi->
how does nixpkgs figure out if nixops is being used? Checkinf for `deployment` configuration(s)?
<Shados>
gchristensen: There's literally always a recovery approach possible. If you've got no console, typically you do still have the ability to boot from an ISO, in which case you can just edit the grub menu from there to rollback.
<Shados>
But it would be a pain for someone, somewhere, no doubt.
<gchristensen>
on AWS I believe the root device would need to be snapshotted, mounted on another machine, edited, snapshotted, and booted from -- destroying the original
<flokli>
The thing is, detecting whether all mointpoints have that option set or not is (almost) impossible to 'inspect' from nix, as it can be arbitrarily nested
<andi->
you can check if any have it set.. any other assumptions ar eeasily false (iscsi multipath, raid1, …)
<andi->
making an `autoLuks.foo.mountPoint = "/foo";` mandatory would workaround the issue
<andi->
the user must then ensure not to screw it up or provide the wrong ones.. Not very nice but not the worst.
<gchristensen>
this sounds promising
<andi->
I am thinking about workloads where it is not a filesystem that is being used ontop of that encrypted device... database logs (oracle did raw disks), ceph, … Would they be affected as well? Probably not.
<gchristensen>
I care about these less, as the diagnostics and recovery is much simpler
<gchristensen>
(I mean, I care about them plenty -- but am less concerned)
<andi->
At least not at first.. Maybe we figure something out later. Keeping systems bootable is the most important bit.
<gchristensen>
+1
<andi->
Guess I'll be working on another NixOps PR then.. introducing the _netdev option when the mointPoint is set and not explictily set to some "I know what I am doing" value..
<Shados>
andi-: flokli: What is it that prevents adding or checking _netdev on the mount points from nixops?
<andi->
Shados: For the trivial cases (filesystem directly ontop of the luks device) we could probably do that. For the more interesting cases (lvm on luks, raid on luks, …) we can not follow the indirections that might be resolvable during runtime.
<andi->
e.g. you might be creating a raid1 device from multiple encrypted volumes. Knowing which part/disk/… uuid the device might have combined with another device isn't really something we usually know.
<Shados>
I see what you mean
<andi->
Generally speaking it wouldn't work for any nixos-generated config since those use uuids and not device paths.
Synthetica has quit [Quit: Connection closed for inactivity]
disasm| has quit [Quit: WeeChat 2.0]
drakonis has joined #nixos-dev
disasm has joined #nixos-dev
<Shados>
Is there a standard way of getting a Nix list of arbitrary strings (e.g. including ones containing spaces) into a bash array? I am guessing not, given the etc-builder doesn't do this, but confirmation would be nice
<Shados>
And on a related note: I can't find the list of characters that are illegal in store paths in the Nix manual; is it in there?
<srhb>
(Would be nice to stick in the manual I suppose)
<Shados>
srhb: Had to go a few pages down for the name portion of the store path, but thanks. For anyone interested: alphanumerics and `+-._?=` are valid.
<gchristensen>
I wonder if each wireguard peer should have its own systemd unit
<ivan>
what for?
<andi->
What is the use case yu have for multiple peers on a single wireguard interface? The experience I have with them is that if something is off it seems like the node is reachable (interface is up) but data will just be lost. I usually do one interface per peer..
<gchristensen>
that is interesting, andi-, I have a `wg0` with several peers -- one for each machine in my "personal" network
<clever>
matthewbauer: i think you can put absolute paths into DT_NEEDED
<clever>
matthewbauer: and then it will just skip RUNPATH/RPATH entirely
pie_ has joined #nixos-dev
<ekleog>
did someone already try to use python's sys.meta_path to do static linking for python? I'm thinking it might make things much better for our python story, if we can get rid of propagatedBuildInputs / python.withPackages
<matthewbauer>
clever: thanks!
<ekleog>
(downside being it works only with python3.4+, so we won't be able to fully switch to it until we get rid of python2… which might just be “never” :( )
<ekleog>
oh wait there's a python2 version too, even though the python3 doc states it came in with python3.4
<ekleog>
-> am I missing something obvious before I add that to my “to try someday” idea list?
<samueldr>
wouldn't it make sense to have two different infra, a "legacy" one for python2, and the python3 living one?
<samueldr>
I mean, if it has clear advantages
<ekleog>
once python2 no longer is our default python we will likely be able to, though ideally we wouldn't need that