samueldr changed the topic of #nixos-dev to: #nixos-dev NixOS Development (#nixos for questions) | NixOS 20.03 Feature Freeze Feb 10 https://discourse.nixos.org/t/nixos-20-03-feature-freeze/5655 | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | https://r13y.com | 19.09 RMs: disasm, sphalerite; 20.03: worldofpeace, disasm | https://logs.nix.samueldr.com/nixos-dev
multun has quit [Ping timeout: 265 seconds]
multun has joined #nixos-dev
Synthetica has quit [Quit: Connection closed for inactivity]
bhipple has joined #nixos-dev
ajs124 has quit [Remote host closed the connection]
Scriptkiddi has quit [Remote host closed the connection]
das_j has quit [Remote host closed the connection]
ris has quit [Ping timeout: 240 seconds]
lovesegfault has joined #nixos-dev
phreedom has quit [Remote host closed the connection]
phreedom has joined #nixos-dev
lovesegfault has quit [Ping timeout: 246 seconds]
lovesegfault has joined #nixos-dev
bhipple has quit [Remote host closed the connection]
drakonis_ has quit [Read error: Connection reset by peer]
orivej has quit [Ping timeout: 268 seconds]
tilpner_ is now known as tilpner
claudiii has joined #nixos-dev
Scriptkiddi has joined #nixos-dev
ajs124 has joined #nixos-dev
das_j has joined #nixos-dev
__monty__ has joined #nixos-dev
<FRidh> I wonder if there are any objections to the pace at which we iterate staging nowadays. It can at times go pretty fast, e.g. now there's 2 new stdenv rebuilds in 2 days, which results in quite some bandwidth needed when following unstable
<Taneb> What's with the eval failure on the nixos:release-20.03 jobset?
<infinisil> FRidh: I once brought up the idea of having multple levels of staging, which I quite like. So one level for e.g. stdenv rebuilds, only getting a couple commits, being merged into a second level that gets core library updates which are more common, etc.
<infinisil> I think that could save quite a bit of resources
claudiii has quit [Quit: Connection closed for inactivity]
<Taneb> In the haskell-updates jobset haskellPackages.generic-deriving failed for weird reasons that don't happen to me locally, could it be restarted? It causes a lot of knock-on failures
<Taneb> A few things seem to have been failing on machine a63b04eb for the same reason
m15k has joined #nixos-dev
pkolloch[m] has joined #nixos-dev
<pie_[bnc]> infinisil: something i shoot myself in the foot with a lot with modules is ones that dont complain if you pass an invalid path
<infinisil> pie_[bnc]: Elaborate?
<pie_[bnc]> just did it with services.tomcat.webapps for example
<pie_[bnc]> i did services.tomcat.webapps = [ ./share/whatever ] instead of ./result/share/whatever
<pie_[bnc]> well, im still rebuilding so hopefully it will work this time and not some other user error
<infinisil> Hm I see, this might be fixable
orivej has joined #nixos-dev
<pie_[bnc]> proceeded to shoot myself in the foot a second time by typoing it
<pie_[bnc]> infinisil: maybe sometimes you want to be able to pass paths that seem invalid at the time, this seems to be generatinga shell script...but most of the time, probably not
<pie_[bnc]> i didnt think about this very hard
<pie_[bnc]> ok maybe im just doing this wrong, #nixos
<Profpatsch> FRidh: Huh, are we merging staging that often?
<Profpatsch> I thought the time between mass rebuilds has gone up since we introduced staging?
<Profpatsch> At least I’m switching to current master a lot and I never have mass rebuilds anymore.
<Profpatsch> Usually I don’t have to build anything.
<Profpatsch> (I really don’t care about re-fetching binaries when following master tbh)
<Profpatsch> FRidh: If you really want to contribute to the solution, help get the CAS into nix :)
<Profpatsch> Fast iteration is *good*, because it means people aren’t afraid to touch low-level parts of the system, which means we won’t accrue as much technical debt.
<Profpatsch> Slowing that down because some people have to fetch more (prebuilt!) binaries because they want to follow master/unstable would be extremely unhealthy.
Scriptkiddi has quit [Remote host closed the connection]
ajs124 has quit [Remote host closed the connection]
das_j has quit [Remote host closed the connection]
ajs124 has joined #nixos-dev
das_j has joined #nixos-dev
Scriptkiddi has joined #nixos-dev
<worldofpeace> Taneb: I asked gchristensen about that
ixxie has joined #nixos-dev
<FRidh> Profpatsch: typically it's once a week or so that staging-next is merged
<FRidh> those killed jobs are indeed annoying
<gchristensen> very :(
<gchristensen> I'm a bit out of my depth on that problem. I'll have to ping eelco
<FRidh> have there been any changes to hydra in the last week or so?
<FRidh> aside from the database change
<gchristensen> not sure
<gchristensen> is master failing to eval too?
<FRidh> no, failing eval is only with release-20.03
<gchristensen> INTERESTING.
<FRidh> but jobs are dying with killed 9 everywhere
<FRidh> new index for 20.03?
<Taneb> It's weird that 20.03 isn't working but 20.03-small, for example, is
<gchristensen> I wonder if I misconfigured it
<FRidh> supportedSystems is missing
<FRidh> when comparing with release-19.09
<Taneb> No it isn't?
<FRidh> oh, you are right
<Taneb> stableBranch is false, whereas for 19.09 it's true
<FRidh> is that not changed on release
<Taneb> Maybe
<Taneb> But aside from which branch it's pointing to, and like names and descriptions, that's the only difference I can see
<andi-> Yeah nixos:trunk-combined is also failing with `hydra-eval-jobs returned signal 9`
<Taneb> Recent change in release-combined.nix?
<andi-> i'd say signal 9 doesn't mean it is related to what it tries to eval?
<Taneb> Quite possibly
<gchristensen> it is probably something fussy about how the memory allocator is configured
<gchristensen> unless it really truly does need 45G to evaluate
<{^_^}> firing: BuildsStuckOverTwoDays: https://status.nixos.org/prometheus/alerts
<andi-> argh, rust-cbindgen doesn't reproduce anymore... This fixed output crap is really annoying -.-
<gchristensen> kill it!
<andi-> https://github.com/rust-lang/rust/issues/63476 it is not our fault this time.. (from the looks of it)
<{^_^}> rust-lang/rust#63476 (by yurivict, 26 weeks ago, open): nightly version fails: invalid version 3 on git_proxy_options; class=Invalid (3)
<gchristensen> :/
<andi-> or maybe it is due to the rust version
<gchristensen> it sounds like a major cause of memory bloat problem is the number of NixOS tests
<gchristensen> each vm is a few hundred mb
<emily> could some of them use containers instead?
<gchristensen> the test framework doesn't have a mechanism for testing with containers
<emily> right, but I mean, in theory
<gchristensen> not sure, but the problem isn't the VM itself, but evaluating NixOS all those times
<emily> oh, ok
<yorick> did the channels stop updating? https://hydra.nixos.org/eval/1570061 finished but didn't make it into the nixos-unstable-small channel
<gchristensen> yorick: do you know https://status.nixos.org/prometheus/alerts ?
<yorick> gchristensen: okay, sorry
<gchristensen> no worries, that isn't a spurn -- just curious
<yorick> gchristensen: I usually see ChannelUpdateStuck in the channel, but not today
<yorick> (which makes sense, since it's been there for 2 days)
<gchristensen> if you click on that alert, you'll see 20.03 is erroring, not unstable-small
<yorick> gchristensen: hmm, so am I reading the hydra output wrong or is nixos-unstable-small also stuck?
<gchristensen> we are exploring together :)
<gchristensen> why, I'm not sure ...
<yorick> "This job is not a member of the latest evaluation of its jobset. This means it was removed or had an evaluation error."
<gchristensen> oh I see
<gchristensen> yes, an evaluation error was introduced, breaking the tested job for nixos-unstable-small
drakonis has joined #nixos-dev
<gchristensen> it is surprising to me that the tested job evaulates for release-combined but not release-small
<gchristensen> that is the first place to look -- bisect between the last known good to the currently broken, and find where the evaulation started failing
m15k has quit [Ping timeout: 260 seconds]
<gchristensen> yorick: where did you see that text, btw?
<clever> gchristensen: likely the build overview page
<clever> job overview*
<gchristensen> interesting
<gchristensen> the fact that it is deleted should be exported on https://hydra.nixos.org/job/nixos/unstable-small/tested/prometheus
NinjaTrappeur has quit [Quit: WeeChat 2.7]
NinjaTrappeur has joined #nixos-dev
<yorick> gchristensen: ooh
tokudan[m] has joined #nixos-dev
<tokudan[m]> so... 19.09-small seems to be stuck on the error: value is a string while an integer was expected, at /tmp/build-112561727/nixpkgs/source/nixos/release.nix:15:59. which is the nixpkgs.revCount part of this line: versionSuffix = (if stableBranch then "." else "beta") + "${toString (nixpkgs.revCount - 192668)}.${nixpkgs.shortRev}";
<tokudan[m]> which is a negative number
<gchristensen> yikes, did those values change recently?
<gchristensen> did something get backported which shouldn't have?
<tokudan[m]> the last time the revCount changed was ~5 months ago, according to the blame game
<tokudan[m]> which should probably be ok
<gchristensen> does it evaluate locally?
<tokudan[m]> that was my local reproduction of the error
<gchristensen> ah
<tokudan[m]> I'll have another look when I'm home
claudiii has joined #nixos-dev
orivej has quit [Ping timeout: 265 seconds]
ris has joined #nixos-dev
orivej has joined #nixos-dev
cole-h has joined #nixos-dev
<thoughtpolice> `revCount - 192668` lol
<thoughtpolice> Anyone ever hit an issue like this with a workaround, or is the only solution truly just to fix rust-dns-resolver: https://github.com/NixOS/nixpkgs/pull/79224#issuecomment-585183243
<thoughtpolice> Oops, wrong channel, really
<{^_^}> firing: BuildsStuckOverTwoDays: https://status.nixos.org/prometheus/alerts
tokudan has joined #nixos-dev
<tokudan> so... I'm able to build the nixos-19.09-small channel with a minor change in the buildscript: instead of revCount = \"$revCount\"; i used revCount = $revCount;
<tokudan> another strange error I noticed though is machine# [ 137.275557] systemd[1]: container@foo.service: start operation timed out. Terminating.
<tokudan> got no idea what the actual problem is
ixxie has quit [Ping timeout: 268 seconds]
ChanServ has quit [shutting down]
<yorick> why would revcount be lower than 192668?
<gchristensen> maybe there is a bug in hydra?
<niksnut> hm, there have been some changes to fetchGit etc., but I don't think that's used here
<gchristensen> yeah, this is hydra's fetch git impl
dongcarl has quit [Read error: Connection reset by peer]
<niksnut> btw, it's astonishing how much slower nixos eval has gotten, 'nix-instantiate nixos/release-combined.nix -A nixos.tests.misc.x86_64-linux --dry-run' took 1.7s in 18.09, but 5.5s on master
<gchristensen> that really is astonishing
<multun> :'(
<clever> compare the function call counts from the profiling json?
ChanServ has joined #nixos-dev
<{^_^}> #79943 (by edolstra, 51 seconds ago, open): NixOS evaluation speed regression
<niksnut> maybe we can have a moratorium on adding modules to module-list.nix
<niksnut> so all new modules have to be enabled via 'imports = [ ... ]'
<gchristensen> we'll need to be careful to not kill our baby
<clever> niksnut: isnt there infinite recursion if imports depends on config?
<niksnut> yes, but how does it depend on config?
<clever> if your using an if statement, to exclude things
<clever> not sure if a giant tree of static imports would be any better then the flag module-list.nix
<niksnut> you shouldn't use an if statement, you just import the modules you need
<niksnut> looks like the worst slowdown was between 19.03 and 19.09 (2.0s -> 5.4s)
<niksnut> no sorry, 19.03 just fails to evaluate after 2.0s ;-)
<gchristensen> let me guess: strongswan
<niksnut> it was 18.09 -> 19.03
<niksnut> there also appears to be some IFD going on
<niksnut> error: cannot import '/nix/store/gccjbjnhw5fr2z4cmbkhjlz5y7xkjcrp-nixpkgs', since path '/nix/store/gccjbjnhw5fr2z4cmbkhjlz5y7xkjcrp-nixpkgs' is not valid, at /home/eelco/Dev/nixpkgs/nixos/release.nix:23:14
<niksnut> copying all of nixpkgs to the store probably isn't helping either
<gchristensen> IFD? hrm. is ofborg not doing its job? :P
<niksnut> no, this is caused by nixpkgs ? { outPath = cleanSource ./..; revCount = 130979; shortRev = "gfedcba"; } in release.nix
<niksnut> which doesn't happen on hydra / ofborg
<gchristensen> ahh
<LnL> niksnut: is there an easy way to invoke the hydra evaluator (or something similar) from the cli?
<thoughtpolice> Regardless of mechanism (Flakes, etc) requiring module imports would be a very good idea from a readability/usability perspective, IMO
<gchristensen> hydra-evaluate-jobs I think, LnL
<gchristensen> erm hmm maybe not that, there is something you can just point at an expression I think?
<niksnut> hydra-eval-jobs
<niksnut> for example: hydra-eval-jobs '<nixpkgs/nixos/release-small.nix>' -I nixpkgs=/home/eelco/Dev/nixpkgs
justanotheruser has quit [Ping timeout: 260 seconds]
<LnL> does that also write stuff to the db?
<gchristensen> I think it just does stdout and the hydra evaluator parses and inserts
<clever> LnL: yeah, it will eval every attr, and write all .drv files to /nix/store/
<clever> but not insert anything into the hydra db
<LnL> ah perfect, thanks!
<clever> hydra-eval-jobsets (the perl script) parses the json, and fills the postgresql
<LnL> right
<clever> ,profile
<{^_^}> clever: Did you mean profiling?
<{^_^}> Use NIX_COUNT_CALLS=1 and/or NIX_SHOW_STATS=1 to profile Nix evaluation
<clever> hydra-eval-jobs will obey these, but the restarting mechanism will overwrite the profile
<clever> you need to patch nix to append rather then overwrite
<clever> - fs.open(outPath, std::fstream::out);
<clever> + fs.open(outPath, std::fstream::out | std::fstream::ate);
<niksnut> we got rid of the restart mechanism
<clever> oh?, how do you deal with heap usage then?
<niksnut> heap usage was bad with and without that
<samueldr> ah, that's not part of master
<samueldr> is the nixos hydra running against master or the flake branch?
<gchristensen> flake
<samueldr> I guess that's something to know for someone that looks into the current evaluation issue
<gchristensen> +1
<samueldr> LnL: ^
<gchristensen> samueldr, LnL: note the footer of hydra says Hydra 0.1.20200211.53896ff (using nix-2.4pre20200207_d2032ed). which is correct https://github.com/nixos/hydra/commit/53896ff
<LnL> ah right
justanotheruser has joined #nixos-dev
<LnL> hmm: restarting hydra-eval-jobs after job '...' because heap size is at ... bytes
<samueldr> AFAIUI this won't happen with the flake branch
<samueldr> since this code was removed
<niksnut> right
<LnL> ah, was going to say how come we see problems if this happens
<samueldr> this might not be solving the problems, depending what they are, as we were hitting against the GC's maximums anyway
<puck> worldofpeace: hey so. https://github.com/NixOS/nixpkgs/pull/77850 broke notifications, since electron dlopens libnotify, and doesn't link to it directly
<{^_^}> #77850 (by worldofpeace, 3 weeks ago, merged): signal-desktop: use autoPatchelfHook, wrap properly
<puck> i believe, at least
<worldofpeace> super familar with that kind of issue (I've probably fixed it 10+ times) , thanks for the mention
<worldofpeace> I think you can add it to `runtimeDependencies` and the autoPatchelfHook will pick it up
<puck> yeah, i'm not entirely sure if that's the actual source of the issue, but that seems the most likely culprit..
lovesegfault has quit [Quit: WeeChat 2.7]
<puck> oh huh, it does
<puck> it works. should i open a pr or are you on it?
<worldofpeace> puck: you can open a PR 👍️ it probably be a bad idea for me to even try too 🤣
<worldofpeace> just ping for merge
<{^_^}> #79949 (by puckipedia, 59 seconds ago, open): signal-desktop: fix notifications
<worldofpeace> puck: thanks, backport is needed also for release-20.03
<{^_^}> #79950 (by puckipedia, 17 seconds ago, open): [20.03] signal-desktop: fix notifications
<worldofpeace> puck: 😁 https://github.com/NixOS/nixpkgs/blob/master/.github/CONTRIBUTING.md#backporting-changes you need the `-x` in the command so it shows what commit it came from in the commit body
<puck> oh, oops
<puck> fixed!
<worldofpeace> puck: cool, all done. thanks again 👋
<LnL> clever: I'll just use an expr, doesn't seem to propagate the stats :/
__monty__ has quit [Quit: leaving]
primeos has quit [Ping timeout: 252 seconds]
primeos has joined #nixos-dev
<LnL> euh, why is this building stuff?
<{^_^}> firing: BuildsStuckOverTwoDays: https://status.nixos.org/prometheus/alerts
dongcarl has joined #nixos-dev