worldofpeace_ changed the topic of #nixos-dev to: #nixos-dev NixOS Development (#nixos for questions) | NixOS 20.03 BETA Announced https://discourse.nixos.org/t/nixos-20-03-beta/5935 | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | https://r13y.com | 19.09 RMs: disasm, sphalerite; 20.03: worldofpeace, disasm | https://logs.nix.samueldr.com/nixos-dev
drakonis has joined #nixos-dev
drakonis has quit [Quit: WeeChat 2.7]
orivej has joined #nixos-dev
<samueldr> sadly, my regular scrape doesn't include info about that PR :(
<samueldr> thinking I should stop doing it "whenever" and have it on a more regular schedule
<andi-> maybe more then just you just be doing it ;-)
<gchristensen> context: a user says they were banned from github -- their profile is gone (https://github.com/ahiaao) and their PRs are 404ing https://github.com/NixOS/nixpkgs/pull/73539 and PRs for their account are "Private" https://github.com/NixOS/nixpkgs/commit/145652462b9f89a29094f6b2b1e4c1faa2935c2e
<gchristensen> and their user API response has a uid, so I feel confidentthis is their handle https://api.github.com/users/ahiaao
<samueldr> I think that can be verified as being a login through github oauth in discourse, in some way
<samueldr> though... now they probably can't?
<gchristensen> I'll mail github
<gchristensen> (not that that usually goes anywhere.)
<worldofpeace> Search engines indexed prior PRs, uhhhn.
<worldofpeace> gchristensen: exactly
<samueldr> it's... disheartening how big and how embracing of github nixos is, but we don't seem to be cool enough to have a more direct link in some way
<samueldr> (thinking mostly about the scope of the project)
<worldofpeace> https://github.com/NixOS/nixpkgs/pull/79115#issuecomment-584195642 is the last mention of them I can see
<worldofpeace> samueldr: totally should, I don't consider us a tiny project with the traffic we get
<samueldr> and it's not the usual deletion, considering a usual deletion transfers all mentions to "ghost"
<gchristensen> maybe github is having a db outage
<samueldr> apparently nixpkgs was ranked (in stars) 3331st, almost a year ago
<worldofpeace> gchristensen: it could be that kind of incident
<gchristensen> I know tumblr would do some funny looking things when som eof its databases were out. anyway, I sent a support request
<worldofpeace> gchristensen: Can you followup on the thread saying it was brought to your attention?
<worldofpeace> thanks everyone, gotta watch out for our contributors.
<gchristensen> I understand the very unfortunate reality about sanctions & GitHub, but those users don't get deleted even
<samueldr> I can't push back the work I'm doing to later right now, but it sure puts weight into the planned rewrite of that github archiving/tooling thing
<andi-> These days GitHub offers "Exports" for "Migrations".. Maybe that gives use useful bundles of data? Just requested a dump of my data.
orivej has quit [Ping timeout: 272 seconds]
rajivr___ has joined #nixos-dev
bhipple has joined #nixos-dev
<jtojnar> worldofpeace I still vaguely recall there were more issues with the wrappers
<jtojnar> but cannot place it
<worldofpeace> Jan Tojnar: oh crap, do you mean this https://github.com/NixOS/nixpkgs/issues/78803 ?
<{^_^}> #78803 (by worldofpeace, 3 weeks ago, open): wrapGAppsHook: make double wrapping workaround work again
<jtojnar> ýeah, that was likely it
<worldofpeace> I believe, very simply put, wrapGAppsHook needed to be more wrapQtAppsHook style 🤣
<jtojnar> I feel like find_gio_modules should go to glib
<worldofpeace> Jan Tojnar: true, it has no reason to be in wrapGAppsHook
<jtojnar> except maybe that some programs do not explicitly depend oon glib
<jtojnar> but then they should be fixed
<worldofpeace> yeah, that could be an issue. we do want to be able to backport the fix
<jtojnar> but glib will likely be propagated by something else the project would not build
<jtojnar> I just do not understand setup hooks to be able to say if theywould be run
<worldofpeace> Jan Tojnar: do you mean the ordering?
<jtojnar> I mean the execution mechanism
<jtojnar> a single package uses /nix-support/setup-hook
<jtojnar> but I did not actually investigate what runs it and how it behaves with propagation
<jtojnar> so I have no mental model of that
<jtojnar> that one should be annoying but harmless
<bhipple> jtojnar: if you put a package in propagatedBuildInputs and it has a setup hook, it should be run for all transitive uses (I think)
<bhipple> what's weird is that if you put it in buildInputs and it happens to get propagated from a natural runtime dependency, it won't run its setup hook in the transitive dep :/
<bhipple> I wouldn't necessarily rely on this behavior, because it feels like it's emergent rather than thoughtfully designed
drakonis has joined #nixos-dev
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 255 seconds]
orivej has joined #nixos-dev
bhipple has quit [Remote host closed the connection]
drakonis has quit [Ping timeout: 260 seconds]
orivej has quit [Ping timeout: 272 seconds]
cole-h has quit [Ping timeout: 268 seconds]
ixxie has joined #nixos-dev
<jtojnar> worldofpeace are you able to run ostree or flatpak installed tests?
<jtojnar> I passed a list of test names to ostree's testRunnerFlags and it got stuck
<jtojnar> but if I only passed a fist half or second half of the list it succeeded 😕️
LnL has quit [Ping timeout: 246 seconds]
LnL has joined #nixos-dev
LnL has joined #nixos-dev
LnL has quit [Changing host]
ixxie has quit [Ping timeout: 240 seconds]
tilpner has quit [Remote host closed the connection]
tilpner has joined #nixos-dev
orivej has joined #nixos-dev
<andi-> gchristensen: I am wondering if we should mark ghc as `big-parallel` to get better scheduling for that job. In the last few days I've seen it take >~5h multiple times.
colemickens_ has quit [Quit: Connection closed for inactivity]
__monty__ has joined #nixos-dev
ixxie has joined #nixos-dev
init_6 has joined #nixos-dev
v0|d has quit [Remote host closed the connection]
<gchristensen> andi-: ghc can't handle more than 4-5 cores iirc
* andi- facepalms
<andi-> wow.
<andi-> looking at https://hydra.nixos.org/build/113299873#tabs-buildsteps it seems that GHC compiles happily for >9h and sometimes times out.. meh
<NinjaTrappeur> gchristensen: this issue is about building programs with a single GHC process. The GHC self build situation got way better on my system (16 cores) since the move to hadrian (shake-based build system). It might worth trying out a // build again.
<gchristensen> ahh okay, cool, let's do it :)
<gchristensen> I was just going to say, too: I think the haskell build function in nixpkgs auto-limits to like 10 cores or something
<gchristensen> (so let's do it)
<__monty__> You're not using the parallel GC are you?
<__monty__> That's infamous for terrible performance in most workloads.
<gchristensen> I know almost nothing about GHC :)
<__monty__> IIRC there's a proposal or patch to disable it by default, though not sure when that would be landing.
colemickens_ has joined #nixos-dev
<colemickens> I'm trying to update python's importlib-metadata 1.3.0->1.5.0. There's a new check dep: pyfakefs. But if I add it, I get infinite recursion problems. Is there a pattern I should follow to get around this?
ixxie has quit [Ping timeout: 258 seconds]
<NinjaTrappeur> ssb-patchwork got a new release yesterday (yes, two releases in 2 days...), could smbdy merge https://github.com/NixOS/nixpkgs/pull/80884 ? Two maintainers validated the PR.
<{^_^}> #80884 (by NinjaTrappeur, 2 hours ago, open): ssb-patchwork: 3.17.4 -> 3.17.5
m1cr0m4n has joined #nixos-dev
init_6 has quit [Quit: init 0]
phreedom has quit [Ping timeout: 240 seconds]
<manveru> seems like nixFlakes breaks `nixos-option` on 20.03?
<{^_^}> #80775 (by jtojnar, 1 day ago, open): tree-wide: Fix with nixUnstable
<jtojnar> hmm, this is different one
<manveru> i have `nix = super.nixFlakes` in my overlay, if that helps
<tilpner> manveru: Did you rebuild-switch yet?
<manveru> no
<manveru> i can build the system fine with standard nix, but haven't upgraded yet
<tilpner> Guess: nixFlakes is much newer than the nix you were using before, and had nixos-option rewritten
<tilpner> And there may be communication with the daemon happening, with both sides using a (slightly?) different protocol
<manveru> no, this is a compilation error, shouldn't be any issue with the protocol
<tilpner> Oh, didn't see your paste
<tilpner> :(
clkamp_ has joined #nixos-dev
<jtojnar> Yup, the patch looks good to me
<manveru> then i guess the issue is that the patch won't work with vanilla nix?
bhipple has joined #nixos-dev
claudiii has joined #nixos-dev
<jtojnar> yeah, we would need ifdefs
bhipple has quit [Ping timeout: 240 seconds]
v0|d has joined #nixos-dev
bhipple has joined #nixos-dev
phreedom has joined #nixos-dev
colemickens_ has quit [Quit: Connection closed for inactivity]
bennofs has quit [Quit: No Ping reply in 180 seconds.]
bennofs has joined #nixos-dev
cole-h has joined #nixos-dev
<samueldr> andi-, gchristensen, tagging big-parallel, but still limiting cores could help with the new setup... thinking here about *other* resources being exhausted like I/O
<andi-> IIRC those builders habe 300GB of RAM. Not sure if we exhaust that. Could be off tho
infinisil has quit [Quit: Configuring ZNC, sorry for the joins/quits!]
infinisil has joined #nixos-dev
<m1cr0m4n> Hey emilazy are you about? Just wanted to ask about the maths on AccuracySec. If I understand this correctly, does lowering the value for more certs not mean that they will run closer to the same time, rather than more spread out throughout a day?
<emily> m1cr0m4n: no: the larger AccuracySec is, the larger the allowed skew
<emily> it's a really confusing name, but check the systemd.timer manpage
<emily> AccuracySec = 1s means "the time must be accurate to within 1s of what is specified"
<emily> AccuracySec = 24h means "the time can vary within a 24h period of what's determined"
<m1cr0m4n> Right but, in this situation do you actually want to make it more accurate? I don't fully understand why you would
<emily> so if you have 3 certs, it'll pick 3 random times of the day, and then coalesce them within 8h periods
<emily> m1cr0m4n: let's say you have 100 certificates that all expire around the same time: hammering let's encrypt for all 100 of them at once is bad
<emily> doing it 100 separate times throughout the day is better
<emily> they'll still potentially coalesce statistically due to the random placement
<emily> but basically we want to coalesce with other periodic timers for power management reasons, but not coalesce with other acme renewals for load management reasons
<emily> (yuriks suggested this approach fwiw)
<m1cr0m4n> yes I agree! I just don't get how that is happening here. The math is diving 24h by the number of certs, so you're gonna have like 24h/100 = < an 15 mins AccuracySec between certs.
<emily> so, the cert renewal checks are initially set to run "daily", which means at midnight. If we didn't do any AccuracySec or skewing, it would renew all 100 at midnight. adding the 24h skew means that all 100 will be renewed at random times throughout the day
<emily> adding AccuracySec = 24h means that it'll notice that all of these renewal timers are within the same 24h period, and coalesce them
<emily> thus defeating the skew (it's still skewed relative to other NixOS systems, but the certs themselves don't get skewed)
<emily> dividing 24h by the number of certs means that we "bucket" them approximately according to the number of certs
<m1cr0m4n> Ok yeah..I think I understand :P Sorry, very confusing time maths
<emily> I totally agree
<emily> mind if I post this log to the PR to help others?
<m1cr0m4n> Fire away! :)
<gchristensen> I wonder if coalescing them is going to cause problems for people with many certs ?
<m1cr0m4n> Well I have 28 certs gchristensen which is kinda how this came about. I'll let you know if it goes sideways ;P
<gchristensen> cool
<emily> gchristensen: hence why I'm dividing the coalescing period by the number of certs :)
<gchristensen> :O
<emily> if you have 100 certs, the renewal requests will be coalesced within a 24h/100 = 14.4 minute period
<emily> I'm not good enough at statistics to figure out what the average number of requests batched together would be. it's a uniform distribution over the 24h centred on midnight and then quantized with that coalescing period
<emily> but I think it'll be good enough and certainly better than the status quo, which renews all 100 at midnight Monday exactly
<emily> in fact, every single NixOS acme certificate currently renews on midnight Monday, local time :|
bhipple has quit [Ping timeout: 258 seconds]
drakonis has joined #nixos-dev
bhipple has joined #nixos-dev
drakonis has quit [Quit: WeeChat 2.7]
ixxie has joined #nixos-dev
justanotheruser has quit [Ping timeout: 240 seconds]
drakonis has joined #nixos-dev
ixxie has quit [Ping timeout: 265 seconds]
ChanServ has quit [shutting down]
ixxie has joined #nixos-dev
ChanServ has joined #nixos-dev
justanotheruser has joined #nixos-dev
clkamp_ has quit [Remote host closed the connection]
ChanServ has quit [shutting down]
v0|d has quit [Remote host closed the connection]
ChanServ has joined #nixos-dev
ixxie has quit [Ping timeout: 272 seconds]
<ajs124> emily: is that before or after the simp_le -> lego switch?
v0|d has joined #nixos-dev
v0|d has quit [Remote host closed the connection]
<emily> after
<emily> probably before too
<emily> but I didn't feel like backporting stuff to pre-lego
<ajs124> But both only renew if needed, right? So if you didn't register all 100 certs at the same time, they shouldn't all be renewed on the same week.
<ajs124> Not that that's much better, but slightly less worse, I guess.
<ajs124> Since you brought that up, I'm really interested in letsencrypt load patterns. I'm sure we're not the only ones that set it up that way.
<emily> ajs124: yeah, there's some degree of natural distribution due to that, but issuing certificates in bulk is a valid pattern
<emily> so we should behave nicely in the presence of that
v0|d has joined #nixos-dev
<emily> (and fwiw AccuracySec was previously set to, like, 15m or something)
claudiii has quit [Quit: Connection closed for inactivity]
<emily> for instance Plex issues certificates automatically for ~every user, I think
<emily> I'm not sure if they have public data but I'd love to see graphs too
ixxie has joined #nixos-dev
<emily> ajs124: apparently: "there's like massive spikes at midnight, 11 and 13, and there's smaller spikes on other hours or round minutes and there's also tinier spike on specific minutes"
<ajs124> Midnight UTC? Yeah, that's what I would have guessed. Weird how those things work out.
<ajs124> We only have 20-30 certs, but are running simp_le and certbot right now, which I'll probably both replace with lego. So all of this is of great interest to me right now ^^
<emily> yeah UTC
<emily> fun fact: some clients don't even ensure the certificate needs renewing before pinging the API >_<
<emily> it's like how the DNS root servers are constantly overloaded with traffic, all of which is almost completely pointless
__monty__ has quit [Quit: leaving]
<samueldr> oof, fun
<ajs124> even I probably wouldn't be that lazy when implementing a client. and that's coming from someone that literally downloaded every single wordpress plugin because he couldn't be bothered to harrass the plugin directory developers into changing an API endpoint.
colemickens_ has joined #nixos-dev
drakonis has quit [Quit: WeeChat 2.7]
<worldofpeace> Can anyone reproduce #80871, and when reverting the mentioned commits, does it stop?
<{^_^}> https://github.com/NixOS/nixpkgs/issues/80871 (by dayflip, 15 hours ago, open): Long startup and shutdown on 20.03 BETA
<worldofpeace> It appears that rngd on shutdown doesn't want to die, so people are waiting for DefaultTimeoutStopSec
<gchristensen> rng... sounsd very very hardware specific
drakonis has joined #nixos-dev
<{^_^}> #71302 (by tokudan, 18 weeks ago, merged): rngd: Start early during boot and encrypted swap entropy fix
<samueldr> not sure if relevant
zarel_ has quit [Ping timeout: 255 seconds]
<samueldr> though it wouldn't surprise me if the issue at boot is hardware-dependant, while at shutdown not
zarel has joined #nixos-dev
<worldofpeace> I updated my system to latest nixos unstable and I noticed this instantly. I've since been running a revert and it seems fine. My system isn't doing much special.
<tokudan> that could be relevant as rngd has less dependencies with my PR and is thus stopped later on shutdown
<tokudan> reverting is not an option for me, as that would lead to a reliable 90 second wait on boot
<worldofpeace> tokudan: I did mention https://github.com/NixOS/nixpkgs/issues/80871#issuecomment-590067519 as the only change I think that would happen on my system
<samueldr> yeah, brought it on since I'm pretty sure the solution is likely not a revert, considering
<samueldr> oh, that was the PR that you linked to in the comments, worldofpeace, sorry
<tokudan> worldofpeace, if you revert the defaultdependencies then encrypted swap will break due to a dependency loop
<worldofpeace> hehe, it's good to know samueldr that we would have investigated the same way :D
<tokudan> I haven't had a chance to test 20.03 yet, I'm running 19.09 with that exact 71302 and it works perfectly fine for me, so there seems to be some other mix between various components
<samueldr> I just remembered that PR passing by
<worldofpeace> tokudan: It seems like everyone having your issue doesn't have that setup?
<tokudan> worldofpeace, how many people use encrypted swap? probably not many
<tokudan> best solution is maybe to set a stop timeout of 1 second for rngd
<gchristensen> my swap is on luks, I'm supposing that doesn't count
<tokudan> rngd shouldn't have to save any data
<gchristensen> I think rngd does save some data to bootstrap randomness next boot?
<tokudan> gchristensen, my swapconfig: { device = "/dev/disk/by-id/xxx"; randomEncryption.enable = true; randomEncryption.source = "/dev/random"; }
<gchristensen> ah
<tokudan> gchristensen, if you're using urandom, you may get unreliable randomness especially during boot
<samueldr> "unreliable" means?
<samueldr> (not doubting, I just read that urandom should be cryptographically safe)
<tokudan> samueldr, low amount of entropy leading to just a couple million possible keys
<tokudan> samueldr, urandom is safe once there has been enough entropy generated. before that it's unreliable, depending on the kernel
<gchristensen> I've heard pretty conflicting information about that being true
<samueldr> yeah, same
drakonis has quit [Quit: WeeChat 2.7]
<tokudan> there are some kernel versions where urandom is unreliable and newer versions i believe may block like random
<samueldr> found the PR
<samueldr> looks like /dev/random will now never block, once the CRNG is initialized
<samueldr> (coming with 5.6)
<samueldr> anyway, that's not even the issue
<tokudan> samueldr, yes, once it's initialized
<tokudan> I'm fine with random blocking, once it's gathered enough entropy
<tokudan> ... before...
<tokudan> and here's a message from the kernel maintainer: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1411670.html
<tokudan> "I tried making /dev/urandom block. The zero day kernel testing within a few hours told us that Ubuntu LTS at the time and OpenWrt would fail to boot. And when Python made a change to call getrandom(2) with the default blocking semantic instead of using /dev/urandom, some distributions using systemd stopped booting."
<tokudan> so using urandom during boot for encryption is a bad idea
<samueldr> yeah, and as I said, it's not relevant to rngd blocking
<samueldr> so no point in getting immersed too deep :)
<tokudan> samueldr, well, I need rngd during early boot to encrypt swap to avoid slow entropy gathering over more than a minute, blocking my boot
<samueldr> yes
<tokudan> so the major question is, why is rngd not dying fast?
<samueldr> there's also the other part of, for one user, rngd starting slowly
<tokudan> starting slowly?
<samueldr> very slow boot
<samueldr> though it might be that it's two distinct problems bundled into that issue,
<samueldr> that the slow boot is not rngd
<tokudan> i don't see rngd at all in the systemd-analyze blame screenshot
<samueldr> yeah, see my previous two lines, of me realizing that the issue is two problems
<samueldr> or, likely two problems
<andi-> I just figured why my yubikeys don't work when plugged in during boot.. rngd spits out something about using them as a randomness source?!? But yeah I also have the very slow shutdowns with the rngd warning
<Profpatsch> andi-: same
<samueldr> can we split the rngd issue from that slow boot?
<Profpatsch> no yubikey required
<andi-> samueldr: at the same time my boots got faster (randomly encrypted swap), so it is likely related?
<Profpatsch> I usually just hard-reset, it’s impossible to kill the rgnd thing with Ctrl-Alt-DEl
<samueldr> andi-: I don't think so, it looks like the submitter of the issue conflated the long boot with rngd hanging at shutdown
<samueldr> >> After login from sddm. I'm still waiting for gui to start. login to awesome window manager.
<worldofpeace> hmm, I think I might have found a fix
<rnhmjoj> stupid question: what exactly is an encrypted swap and why does that need entropy on boot? i have a laptop with luks+lvm (unlocked during stage 1) and the swap partition is encrypted but never had issues with booting due to low entropy
<samueldr> I think here it's a distinct encryption per boot, with ephemeral keys
<andi-> rnhmjoj: I give swap a new encryption key on every boot. It doesn't support hibernation but also does never leak RAM to disk.
<samueldr> in your setup, like mine, rnhmjoj, when I boot the swap file/partition is accessible next boot
<worldofpeace> Idk, I read the doc on DefaultDependencies and looked at the upstream rngd unit, and doing https://gist.github.com/worldofpeace/320d9f3d76affc158726a0fe034566ed shutdown happens cleanly for me
<worldofpeace> (I really did just reboot my machine like crazy)
<rnhmjoj> andi-: ok, i got it. thank you
<hexa-> really, yubikey as a rng source? my yubikey 4 with the faulty dual ec drbg generator?
<tokudan> worldofpeace, that sounds like a good fix, I didn't think the shutdown would have to conflict with anything :)
<andi-> hexa-: it just logs about it, not sure about the consequences..
<worldofpeace> tokudan: yeah, I just rebooted my system like 20 times. and diff my configuration that's just latest nixos unstable it never happened
<hexa-> andi-: mixing in faulty random numbers isn't really … bad per se … but still.
<andi-> hexa-: usually they are xor'ed so not much harm done..
<tokudan> hexa-, rngd uses everything available. even if some sources may be unreliable, it doesn't reduce the quality if additional sources are available
<hexa-> yep, I'm aware
<tokudan> funny thing is, I've got a yubikey myself and I don't have any issues... so It's probably an update in 20.03 that allows yubikeys as rng source
<andi-> tokudan: That makes sense. As long as I've been on unstable with this machine it has been like that (unable to use it after reboot without replugging)
<{^_^}> #80920 (by worldofpeace, 10 seconds ago, open): nixos/rngd: fix clean shutdown
<tokudan> I'll probably try out 20.03 in a couple of days but right now I need a running system without having to watch what generation I'm booting ;)
<tokudan> worldofpeace, if that works fine, is there a reason to have rngd only start early when swap is encrypted? or should we clean that up and have rngd start early in any case and save some complexity?
<rnhmjoj> worldofpeace: did you see this? https://github.com/nhorman/rng-tools/issues/73
<{^_^}> nhorman/rng-tools#73 (by diego-sueiro, 14 weeks ago, closed): rngd hanging when terminating
<andi-> I believe the encrypted swas is in place before systemd is executed?! (Could be wrong)
<andi-> *swap
<tokudan> andi-, no, systemd takes care of swap encryption
<tokudan> andi-, otherwise systemd wouldn't be able to start rngd before swap setup ;)
<andi-> mmhm
ixxie has quit [Ping timeout: 252 seconds]
<rnhmjoj> uhm, looks like the version of rng-tools with that fix is already in 20.03