gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
orivej has joined #nixos-borg
ryantm has joined #nixos-borg
orivej has quit [Ping timeout: 240 seconds]
{^_^} has quit [*.net *.split]
andi- has quit [*.net *.split]
gleber_ has quit [*.net *.split]
gleber_ has joined #nixos-borg
{^_^} has joined #nixos-borg
andi- has joined #nixos-borg
FRidh has joined #nixos-borg
<FRidh> is ofborg down?
<LnL> yeah, there seems to be a problem with the evaluators
<LnL> but gchristensen probably isn't awake yet
<FRidh> @GrahamcOfBorg wakeup gchristensen :)
<gchristensen> I'm up!
<FRidh> Good morning!
<gchristensen> oh damn!
<LnL> \o/
<LnL> good morning :D
* gchristensen runs away, goes back to bed
<LnL> if your coffee is ready, take a look at the evaluation queue
<gchristensen> ok let me get the coffee started. another 10min of broken won't end the world
<gchristensen> huh
<gchristensen> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
<gchristensen> 5122 ofborg 20 0 14.242g 0.012t 8836 R 100.0 80.0 1183:13 nix-env
<gchristensen> something broke nix-env on master around 13hrs ago I think
<gchristensen> LnL: can you start looking in to that?
<gchristensen> I'm making sure it isn't ofborg's fault first
<LnL> heh, "the infrastructure is broken"
<gchristensen> uuuuhhh
<LnL> can't test for an hour or so
<gchristensen> Jul 17 15:20:23 eval-0-gleber.ewr1.nix.ci ofborg-evaluator-start[31933]: - passExtensions = recurseIntoAttrs pass.extensions;
<gchristensen> Jul 17 15:20:23 eval-0-gleber.ewr1.nix.ci ofborg-evaluator-start[31933]: + passExtensions = recurseOnHydra pass.extensions;
<gchristensen> I wonder if they're all stuck on this
<gchristensen> no
<LnL> infinite recursing or something?
<gchristensen> hmmmm
<gchristensen> 2/3 are stuck on PRs from matthewbauer :P
<gchristensen> a coincidence I'm sure
<gchristensen> wtf
<FRidh> IFD?
<gchristensen> nope
<gchristensen> not a coincidence
<gchristensen> those two PRs from matthewbauer broke 2/3 eals
<gchristensen> the third evaluator is still working
<LnL> so it's that slow to catch up on it's own?
<gchristensen> yeah
<gchristensen> I mean, evals are expensive
<gchristensen> hah
<gchristensen> I don't run 3 just for fun
<LnL> I know but I didn't expect it to get behind that much
<gchristensen> cool, I can't even Ctrl-C the nix-env process from https://github.com/NixOS/nixpkgs/pull/43670
<{^_^}> #43670 (by matthewbauer, open): Add recurseOnHydra function
<LnL> kill -9?
<gchristensen> yep
<gchristensen> maybe I should fail the build if an evaluation takes longer than 9 hours
<LnL> yeah, we should probably implement a kill eval after ...
<gchristensen> 8 seconds? :)
<LnL> or something more high level like, kill the service if something in the metrics doesn't change for a certain amount of time
<gchristensen> that would also be good
<gchristensen> Packet is really good
* gchristensen is a fanboi
<gchristensen> LnL: I codlun't remember where I wrote sync sleep sync, but just found it:
<gchristensen> + sync
<gchristensen> + sleep 1
<gchristensen> + sync
<gchristensen> + sleep 1
<gchristensen> + sync
orivej has joined #nixos-borg
<gchristensen> shit
<gchristensen> master doesn't evaluate
<gchristensen> GC Warning: Failed to expand heap by 8388608 bytes
<gchristensen> GC Warning: Out of Memory! Heap size: 8072 MiB. Returning NULL!
<LnL> heap size or memory?
<gchristensen> hrm?
<LnL> is it the gc heap size limit or actual memory limit on the node
<gchristensen> heap size, system has 16G
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-borg
<LnL> did you set that to the same as hydra does?
<gchristensen> I don't know
<gchristensen> I set the initial heap size to 4g
<gchristensen> hydra uses evaluator_initial_heap_size = 10G
<gchristensen> :o
<LnL> maybe it hasn't hit hydra yet, but I don't see anything noticable in the metrics
<LnL> since you couldn't find it last time :)
orivej has quit [Ping timeout: 240 seconds]
<gchristensen> one of the evaluators reliably fails to start with "thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Protocol("the handshake was interrupted: A read attempt return..." how annoying
<LnL> :/
<gchristensen> oh, lol
<gchristensen> its in Sydney
<gchristensen> maybe the connection startup code in the amqp lib handles slow connections at start poorly
jtojnar has quit [Read error: Connection reset by peer]
jtojnar has joined #nixos-borg
andi- has quit [Ping timeout: 276 seconds]
andi- has joined #nixos-borg
FRidh has quit [Quit: Konversation terminated!]
FRidh has joined #nixos-borg
<LnL> killing the eval doesn't look super hard
<LnL> would be much nicer with futures tho :p
<gchristensen> do it
<gchristensen> let's start moving to futures
<andi-> use the old futures crate unless they finally stabilized the new version. IIRC 0.2.x was just an itermediate WIP version with broken stuff.
<gchristensen> ok then maybe don't do it haha
<andi-> The work there is moving towards support of the await/async language features
<andi-> Thats supposed to land sooon? (this month?)
<LnL> I don't think 0.2 is a thing, is it?
<gchristensen> check it out, LnL, if it looks good, let's do it
<gchristensen> I like Rust. I'm not going to quit it even though I don't like futures :)
<gchristensen> (which means better get living with it)
FRidh has quit [Quit: Konversation terminated!]
<LnL> I got distracted again... :p
<gchristensen> I wonder if I can make nixops use a pure-eval mode :evil:
{`-`} has joined #nixos-borg
orivej has joined #nixos-borg
orivej has quit [Ping timeout: 268 seconds]