<samueldr>
it's... disheartening how big and how embracing of github nixos is, but we don't seem to be cool enough to have a more direct link in some way
<samueldr>
(thinking mostly about the scope of the project)
<samueldr>
I can't push back the work I'm doing to later right now, but it sure puts weight into the planned rewrite of that github archiving/tooling thing
<andi->
These days GitHub offers "Exports" for "Migrations".. Maybe that gives use useful bundles of data? Just requested a dump of my data.
orivej has quit [Ping timeout: 272 seconds]
rajivr___ has joined #nixos-dev
bhipple has joined #nixos-dev
<jtojnar>
worldofpeace I still vaguely recall there were more issues with the wrappers
<jtojnar>
that one should be annoying but harmless
<bhipple>
jtojnar: if you put a package in propagatedBuildInputs and it has a setup hook, it should be run for all transitive uses (I think)
<bhipple>
what's weird is that if you put it in buildInputs and it happens to get propagated from a natural runtime dependency, it won't run its setup hook in the transitive dep :/
<bhipple>
I wouldn't necessarily rely on this behavior, because it feels like it's emergent rather than thoughtfully designed
drakonis has joined #nixos-dev
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 255 seconds]
orivej has joined #nixos-dev
bhipple has quit [Remote host closed the connection]
drakonis has quit [Ping timeout: 260 seconds]
orivej has quit [Ping timeout: 272 seconds]
cole-h has quit [Ping timeout: 268 seconds]
ixxie has joined #nixos-dev
<jtojnar>
worldofpeace are you able to run ostree or flatpak installed tests?
<jtojnar>
I passed a list of test names to ostree's testRunnerFlags and it got stuck
<jtojnar>
but if I only passed a fist half or second half of the list it succeeded 😕️
LnL has quit [Ping timeout: 246 seconds]
LnL has joined #nixos-dev
LnL has joined #nixos-dev
LnL has quit [Changing host]
ixxie has quit [Ping timeout: 240 seconds]
tilpner has quit [Remote host closed the connection]
tilpner has joined #nixos-dev
orivej has joined #nixos-dev
<andi->
gchristensen: I am wondering if we should mark ghc as `big-parallel` to get better scheduling for that job. In the last few days I've seen it take >~5h multiple times.
colemickens_ has quit [Quit: Connection closed for inactivity]
__monty__ has joined #nixos-dev
ixxie has joined #nixos-dev
init_6 has joined #nixos-dev
v0|d has quit [Remote host closed the connection]
<gchristensen>
andi-: ghc can't handle more than 4-5 cores iirc
<NinjaTrappeur>
gchristensen: this issue is about building programs with a single GHC process. The GHC self build situation got way better on my system (16 cores) since the move to hadrian (shake-based build system). It might worth trying out a // build again.
<gchristensen>
ahh okay, cool, let's do it :)
<gchristensen>
I was just going to say, too: I think the haskell build function in nixpkgs auto-limits to like 10 cores or something
<gchristensen>
(so let's do it)
<__monty__>
You're not using the parallel GC are you?
<__monty__>
That's infamous for terrible performance in most workloads.
<gchristensen>
I know almost nothing about GHC :)
<__monty__>
IIRC there's a proposal or patch to disable it by default, though not sure when that would be landing.
colemickens_ has joined #nixos-dev
<colemickens>
I'm trying to update python's importlib-metadata 1.3.0->1.5.0. There's a new check dep: pyfakefs. But if I add it, I get infinite recursion problems. Is there a pattern I should follow to get around this?
ixxie has quit [Ping timeout: 258 seconds]
<NinjaTrappeur>
ssb-patchwork got a new release yesterday (yes, two releases in 2 days...), could smbdy merge https://github.com/NixOS/nixpkgs/pull/80884 ? Two maintainers validated the PR.
<manveru>
no, this is a compilation error, shouldn't be any issue with the protocol
<tilpner>
Oh, didn't see your paste
<tilpner>
:(
clkamp_ has joined #nixos-dev
<jtojnar>
Yup, the patch looks good to me
<manveru>
then i guess the issue is that the patch won't work with vanilla nix?
bhipple has joined #nixos-dev
claudiii has joined #nixos-dev
<jtojnar>
yeah, we would need ifdefs
bhipple has quit [Ping timeout: 240 seconds]
v0|d has joined #nixos-dev
bhipple has joined #nixos-dev
phreedom has joined #nixos-dev
colemickens_ has quit [Quit: Connection closed for inactivity]
bennofs has quit [Quit: No Ping reply in 180 seconds.]
bennofs has joined #nixos-dev
cole-h has joined #nixos-dev
<samueldr>
andi-, gchristensen, tagging big-parallel, but still limiting cores could help with the new setup... thinking here about *other* resources being exhausted like I/O
<andi->
IIRC those builders habe 300GB of RAM. Not sure if we exhaust that. Could be off tho
infinisil has quit [Quit: Configuring ZNC, sorry for the joins/quits!]
infinisil has joined #nixos-dev
<m1cr0m4n>
Hey emilazy are you about? Just wanted to ask about the maths on AccuracySec. If I understand this correctly, does lowering the value for more certs not mean that they will run closer to the same time, rather than more spread out throughout a day?
<emily>
m1cr0m4n: no: the larger AccuracySec is, the larger the allowed skew
<emily>
it's a really confusing name, but check the systemd.timer manpage
<emily>
AccuracySec = 1s means "the time must be accurate to within 1s of what is specified"
<emily>
AccuracySec = 24h means "the time can vary within a 24h period of what's determined"
<m1cr0m4n>
Right but, in this situation do you actually want to make it more accurate? I don't fully understand why you would
<emily>
so if you have 3 certs, it'll pick 3 random times of the day, and then coalesce them within 8h periods
<emily>
m1cr0m4n: let's say you have 100 certificates that all expire around the same time: hammering let's encrypt for all 100 of them at once is bad
<emily>
doing it 100 separate times throughout the day is better
<emily>
they'll still potentially coalesce statistically due to the random placement
<emily>
but basically we want to coalesce with other periodic timers for power management reasons, but not coalesce with other acme renewals for load management reasons
<emily>
(yuriks suggested this approach fwiw)
<m1cr0m4n>
yes I agree! I just don't get how that is happening here. The math is diving 24h by the number of certs, so you're gonna have like 24h/100 = < an 15 mins AccuracySec between certs.
<emily>
so, the cert renewal checks are initially set to run "daily", which means at midnight. If we didn't do any AccuracySec or skewing, it would renew all 100 at midnight. adding the 24h skew means that all 100 will be renewed at random times throughout the day
<emily>
adding AccuracySec = 24h means that it'll notice that all of these renewal timers are within the same 24h period, and coalesce them
<emily>
thus defeating the skew (it's still skewed relative to other NixOS systems, but the certs themselves don't get skewed)
<emily>
dividing 24h by the number of certs means that we "bucket" them approximately according to the number of certs
<m1cr0m4n>
Ok yeah..I think I understand :P Sorry, very confusing time maths
<emily>
I totally agree
<emily>
mind if I post this log to the PR to help others?
<m1cr0m4n>
Fire away! :)
<gchristensen>
I wonder if coalescing them is going to cause problems for people with many certs ?
<m1cr0m4n>
Well I have 28 certs gchristensen which is kinda how this came about. I'll let you know if it goes sideways ;P
<gchristensen>
cool
<emily>
gchristensen: hence why I'm dividing the coalescing period by the number of certs :)
<gchristensen>
:O
<emily>
if you have 100 certs, the renewal requests will be coalesced within a 24h/100 = 14.4 minute period
<emily>
I'm not good enough at statistics to figure out what the average number of requests batched together would be. it's a uniform distribution over the 24h centred on midnight and then quantized with that coalescing period
<emily>
but I think it'll be good enough and certainly better than the status quo, which renews all 100 at midnight Monday exactly
<emily>
in fact, every single NixOS acme certificate currently renews on midnight Monday, local time :|
bhipple has quit [Ping timeout: 258 seconds]
drakonis has joined #nixos-dev
bhipple has joined #nixos-dev
drakonis has quit [Quit: WeeChat 2.7]
ixxie has joined #nixos-dev
justanotheruser has quit [Ping timeout: 240 seconds]
drakonis has joined #nixos-dev
ixxie has quit [Ping timeout: 265 seconds]
ChanServ has quit [shutting down]
ixxie has joined #nixos-dev
ChanServ has joined #nixos-dev
justanotheruser has joined #nixos-dev
clkamp_ has quit [Remote host closed the connection]
ChanServ has quit [shutting down]
v0|d has quit [Remote host closed the connection]
ChanServ has joined #nixos-dev
ixxie has quit [Ping timeout: 272 seconds]
<ajs124>
emily: is that before or after the simp_le -> lego switch?
v0|d has joined #nixos-dev
v0|d has quit [Remote host closed the connection]
<emily>
after
<emily>
probably before too
<emily>
but I didn't feel like backporting stuff to pre-lego
<ajs124>
But both only renew if needed, right? So if you didn't register all 100 certs at the same time, they shouldn't all be renewed on the same week.
<ajs124>
Not that that's much better, but slightly less worse, I guess.
<ajs124>
Since you brought that up, I'm really interested in letsencrypt load patterns. I'm sure we're not the only ones that set it up that way.
<emily>
ajs124: yeah, there's some degree of natural distribution due to that, but issuing certificates in bulk is a valid pattern
<emily>
so we should behave nicely in the presence of that
v0|d has joined #nixos-dev
<emily>
(and fwiw AccuracySec was previously set to, like, 15m or something)
claudiii has quit [Quit: Connection closed for inactivity]
<emily>
for instance Plex issues certificates automatically for ~every user, I think
<emily>
I'm not sure if they have public data but I'd love to see graphs too
ixxie has joined #nixos-dev
<emily>
ajs124: apparently: "there's like massive spikes at midnight, 11 and 13, and there's smaller spikes on other hours or round minutes and there's also tinier spike on specific minutes"
<ajs124>
Midnight UTC? Yeah, that's what I would have guessed. Weird how those things work out.
<ajs124>
We only have 20-30 certs, but are running simp_le and certbot right now, which I'll probably both replace with lego. So all of this is of great interest to me right now ^^
<emily>
yeah UTC
<emily>
fun fact: some clients don't even ensure the certificate needs renewing before pinging the API >_<
<emily>
it's like how the DNS root servers are constantly overloaded with traffic, all of which is almost completely pointless
__monty__ has quit [Quit: leaving]
<samueldr>
oof, fun
<ajs124>
even I probably wouldn't be that lazy when implementing a client. and that's coming from someone that literally downloaded every single wordpress plugin because he couldn't be bothered to harrass the plugin directory developers into changing an API endpoint.
colemickens_ has joined #nixos-dev
drakonis has quit [Quit: WeeChat 2.7]
<worldofpeace>
Can anyone reproduce #80871, and when reverting the mentioned commits, does it stop?
<{^_^}>
#71302 (by tokudan, 18 weeks ago, merged): rngd: Start early during boot and encrypted swap entropy fix
<samueldr>
not sure if relevant
zarel_ has quit [Ping timeout: 255 seconds]
<samueldr>
though it wouldn't surprise me if the issue at boot is hardware-dependant, while at shutdown not
zarel has joined #nixos-dev
<worldofpeace>
I updated my system to latest nixos unstable and I noticed this instantly. I've since been running a revert and it seems fine. My system isn't doing much special.
<tokudan>
that could be relevant as rngd has less dependencies with my PR and is thus stopped later on shutdown
<tokudan>
reverting is not an option for me, as that would lead to a reliable 90 second wait on boot
<samueldr>
yeah, brought it on since I'm pretty sure the solution is likely not a revert, considering
<samueldr>
oh, that was the PR that you linked to in the comments, worldofpeace, sorry
<tokudan>
worldofpeace, if you revert the defaultdependencies then encrypted swap will break due to a dependency loop
<worldofpeace>
hehe, it's good to know samueldr that we would have investigated the same way :D
<tokudan>
I haven't had a chance to test 20.03 yet, I'm running 19.09 with that exact 71302 and it works perfectly fine for me, so there seems to be some other mix between various components
<samueldr>
I just remembered that PR passing by
<worldofpeace>
tokudan: It seems like everyone having your issue doesn't have that setup?
<tokudan>
worldofpeace, how many people use encrypted swap? probably not many
<tokudan>
best solution is maybe to set a stop timeout of 1 second for rngd
<gchristensen>
my swap is on luks, I'm supposing that doesn't count
<tokudan>
rngd shouldn't have to save any data
<gchristensen>
I think rngd does save some data to bootstrap randomness next boot?
<tokudan>
"I tried making /dev/urandom block. The zero day kernel testing within a few hours told us that Ubuntu LTS at the time and OpenWrt would fail to boot. And when Python made a change to call getrandom(2) with the default blocking semantic instead of using /dev/urandom, some distributions using systemd stopped booting."
<tokudan>
so using urandom during boot for encryption is a bad idea
<samueldr>
yeah, and as I said, it's not relevant to rngd blocking
<samueldr>
so no point in getting immersed too deep :)
<tokudan>
samueldr, well, I need rngd during early boot to encrypt swap to avoid slow entropy gathering over more than a minute, blocking my boot
<samueldr>
yes
<tokudan>
so the major question is, why is rngd not dying fast?
<samueldr>
there's also the other part of, for one user, rngd starting slowly
<samueldr>
though it might be that it's two distinct problems bundled into that issue,
<samueldr>
that the slow boot is not rngd
<tokudan>
i don't see rngd at all in the systemd-analyze blame screenshot
<samueldr>
yeah, see my previous two lines, of me realizing that the issue is two problems
<samueldr>
or, likely two problems
<andi->
I just figured why my yubikeys don't work when plugged in during boot.. rngd spits out something about using them as a randomness source?!? But yeah I also have the very slow shutdowns with the rngd warning
<Profpatsch>
andi-: same
<samueldr>
can we split the rngd issue from that slow boot?
<Profpatsch>
no yubikey required
<andi->
samueldr: at the same time my boots got faster (randomly encrypted swap), so it is likely related?
<Profpatsch>
I usually just hard-reset, it’s impossible to kill the rgnd thing with Ctrl-Alt-DEl
<samueldr>
andi-: I don't think so, it looks like the submitter of the issue conflated the long boot with rngd hanging at shutdown
<samueldr>
>> After login from sddm. I'm still waiting for gui to start. login to awesome window manager.
<worldofpeace>
hmm, I think I might have found a fix
<rnhmjoj>
stupid question: what exactly is an encrypted swap and why does that need entropy on boot? i have a laptop with luks+lvm (unlocked during stage 1) and the swap partition is encrypted but never had issues with booting due to low entropy
<samueldr>
I think here it's a distinct encryption per boot, with ephemeral keys
<andi->
rnhmjoj: I give swap a new encryption key on every boot. It doesn't support hibernation but also does never leak RAM to disk.
<samueldr>
in your setup, like mine, rnhmjoj, when I boot the swap file/partition is accessible next boot
<worldofpeace>
(I really did just reboot my machine like crazy)
<rnhmjoj>
andi-: ok, i got it. thank you
<hexa->
really, yubikey as a rng source? my yubikey 4 with the faulty dual ec drbg generator?
<tokudan>
worldofpeace, that sounds like a good fix, I didn't think the shutdown would have to conflict with anything :)
<andi->
hexa-: it just logs about it, not sure about the consequences..
<worldofpeace>
tokudan: yeah, I just rebooted my system like 20 times. and diff my configuration that's just latest nixos unstable it never happened
<hexa->
andi-: mixing in faulty random numbers isn't really … bad per se … but still.
<andi->
hexa-: usually they are xor'ed so not much harm done..
<tokudan>
hexa-, rngd uses everything available. even if some sources may be unreliable, it doesn't reduce the quality if additional sources are available
<hexa->
yep, I'm aware
<tokudan>
funny thing is, I've got a yubikey myself and I don't have any issues... so It's probably an update in 20.03 that allows yubikeys as rng source
<andi->
tokudan: That makes sense. As long as I've been on unstable with this machine it has been like that (unable to use it after reboot without replugging)
<tokudan>
I'll probably try out 20.03 in a couple of days but right now I need a running system without having to watch what generation I'm booting ;)
<tokudan>
worldofpeace, if that works fine, is there a reason to have rngd only start early when swap is encrypted? or should we clean that up and have rngd start early in any case and save some complexity?