justanotheruser has quit [Read error: Connection reset by peer]
drakonis has quit [Quit: WeeChat 2.7]
drakonis has joined #nixos-dev
drakonis has quit [Remote host closed the connection]
drakonis has joined #nixos-dev
drakonis has quit [Quit: WeeChat 2.7]
zarel_ has joined #nixos-dev
zarel has quit [Ping timeout: 268 seconds]
bhipple has quit [Remote host closed the connection]
ixxie has joined #nixos-dev
colemickens_ has joined #nixos-dev
cole-h has quit [Ping timeout: 265 seconds]
orivej has quit [Ping timeout: 258 seconds]
ixxie has quit [Ping timeout: 272 seconds]
__monty__ has joined #nixos-dev
colemickens_ has quit [Quit: Connection closed for inactivity]
<rnhmjoj>
i've found an unusual issue in nixosTests.gitea. the test defines multiple configurations with `nesting.clone` and switch to them using `switch-to-configuration` perl script.
<rnhmjoj>
the test succeeds on x86_64 but on aarch64 or i686 the system is trying to run the perl code as if it were a shell script and obsviously fails
<rnhmjoj>
i thought it had something to do with the shebang but they scripts are virtually identical
<rnhmjoj>
* nixosTests.caddy, sorry
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 255 seconds]
orivej has joined #nixos-dev
Jackneill has quit [Read error: Connection reset by peer]
Jackneill has joined #nixos-dev
orivej has quit [Ping timeout: 265 seconds]
<rnhmjoj>
uhm, it seems nesting is broken. nixosTests.nesting is not passing either
misuzu has quit [Remote host closed the connection]
<samueldr>
what kind of wizardy is that?
<gchristensen>
28 core machines with --max-jobs 1 --cores 28
misuzu has joined #nixos-dev
<samueldr>
nice
<gchristensen>
and chromium is at [26954/37586] CXX obj/extensions/browser/browser_sources/quota_service.o only 40 minutes in
<samueldr>
so it looks like it wasn't an aarch64-only problem, that builds might be facing resource exhaustion?
<gchristensen>
big-parallel was just not given any special allocation really
orivej has quit [Ping timeout: 255 seconds]
<samueldr>
right
<samueldr>
should I hack together the eval reports scraper to get build times for things to help tag all e.g. 95th percentile+ of build times big-parallel?
<ryantm>
Well, the latest nixpkgs-update deploy I did is messed up and a bunch of PRs got made without actually updating the src attribute. I'm working through closing them now. This is an interesting test to see if some people are looking at the diffs before approving.
<samueldr>
(or at least identify)
<gchristensen>
ryantm: interesting :)
<samueldr>
right, totally intended to see if everyone was following :)
justanotheruser has joined #nixos-dev
<gchristensen>
samueldr: I could get you the database if you wanted :P
<samueldr>
oh, at that point you can probably make a query to output the data you need
<gchristensen>
I'm not actually sure the db records big-parallel-eyness
ixxie has joined #nixos-dev
<samueldr>
you don't necessarily have to, you just need to get the long builds, for now, and if you pick an eval from before your recent fix, I figure you get a more representative set
<gchristensen>
oh right
<samueldr>
you might also prefer going for a "cut-off", e.g. all builds longer than 60 minutes
<samueldr>
(spitballing ideas)
<gchristensen>
a cool idea
<samueldr>
though 60 minutes could be short, depending
<samueldr>
I don't have a good idea of how long jobs usually build for
<samueldr>
okay, so that's an average of ~2h10 for a step if I understand this right
<samueldr>
but we also have steps going for 4, even 10 hours
<gchristensen>
those are seconds
<samueldr>
I think one of thosefunky stacked block plot would help getting a better feeling
<samueldr>
oops
<gchristensen>
one sec here ... trying a different thing
<samueldr>
still, we do have steps going for 4, even 10 hours, so that means we have a much less proportional amount of longer steps, which is probably good
<gchristensen>
samueldr: you want to know how manyt ake more than 1h?
<ryantm>
jtojnar and marsam are the winners of noticing, and the couple people who didn't notice will remain unnamed but have been made aware.
<samueldr>
more to the point, what is a point to start marking them big-parallel
<gchristensen>
samueldr: I agree
<samueldr>
there I said 1h, but that's more of an example
<gchristensen>
samueldr: https://status.nixos.org/prometheus/graph?g0.range_input=1d&g0.expr=sum(hydra_machine_build_duration_bucket%7Ble%3D%223600%22%7D)&g0.tab=0 this shows the number of builds at that moment taking at least 3600 seconds. but the histogram is not very good, it would be good to have another histogram for the total build time -- something we don't have
<andi->
ryantm: can you maybe apply that test to 10% of all PRs you open? :)
<andi->
and provide an @ryan-tm fix the invalid hash cmd :D
<andi->
gchristensen: those build times look good. Do we have utilization statistics for those dedicated runners? (I can probably get that if I know the machine name?)
bhipple has joined #nixos-dev
<gchristensen>
andi-: https://hydra.nixos.org/queue-runner-status you can find machine names with features. seems there are some machines with mandatoryfeatures that probably shouldn't.
<gchristensen>
andi-: ah, the ones with mandatoryFeatures are "disabled" meaning they're not in the config anymore.
<gchristensen>
with these changes, I wonder how fast our minimum mass-rebuild-to-release time is.
<andi->
there is one way to figure that out ;) I just wish we could select jobs to go to a "garbage" bucket that is cleaned/recycled every now an then
<gchristensen>
I'm not sure I follow
<andi->
which part?
<gchristensen>
what does it mean to clean / recycle?
<andi->
delete
<samueldr>
this really makes me think we need some kind of "value" that is mechanically refreshable, that is a relative amount of time, compared to a specific job
<gchristensen>
ah
<samueldr>
like LFS has
<samueldr>
a nixpkgs meta, not a nix-side thing
cole-h has joined #nixos-dev
bhipple has quit [Ping timeout: 258 seconds]
FRidh2 has quit [Quit: Konversation terminated!]
bhipple has joined #nixos-dev
bhipple has quit [Remote host closed the connection]
CRTified has joined #nixos-dev
colemickens_ has joined #nixos-dev
ixxie has quit [Ping timeout: 240 seconds]
<worldofpeace>
gchristensen: ahh, disasm noticed this actually. But I think the first time we did it, it was actually right, and when we recreated 20.03 it was incorrect.
<gchristensen>
ah :)
<worldofpeace>
gchristensen: I still have a lot of notes from that day to improve the docs, most notably, to make it an ORDERED list