gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
<cole-h> >> Internal error writing commit status: Error(Codec(Error("expected value",
<cole-h> GitHub pls
orivej has quit [Ping timeout: 256 seconds]
orivej_ has joined #nixos-borg
<cole-h> gchristensen: It looks like a few things died ~hr ago. eval filter, comment filter, comment poster, and log collector all have backtraces
<gchristensen> thanks cole-h
<cole-h> :)
hmpffff_ has joined #nixos-borg
hmpffff has quit [Ping timeout: 260 seconds]
LnL has quit [Ping timeout: 260 seconds]
LnL has joined #nixos-borg
<cole-h> gchristensen: comment filter seems dead again, even though it has good logs after a backtrace... Just posted something and nothing showed up in logs
<LnL> morning
<cole-h> I guess I never realized you were EU, LnL
<cole-h> I'm just about to go to sleep :P
<LnL> yeah
<cole-h> Well, good night. See you in a handful of hours :)
cole-h has quit [Quit: Goodbye]
orivej_ has quit [Ping timeout: 260 seconds]
hmpffff has joined #nixos-borg
hmpffff_ has quit [Ping timeout: 240 seconds]
hmpffff_ has joined #nixos-borg
hmpffff has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-borg
cole-h has joined #nixos-borg
<cole-h> I'm actually an idiot lol
<cole-h> "comment filter seems dead again" -> because I posted `@ofborg build` and not `@ofborg build list of attrs`
<cole-h> Derp
<LnL> FYI either my connection was _extremely_ unstable yesterday or something fishy is going on
<cole-h> re: all the stalled darwin alerts?
<LnL> yeah check the graph
<cole-h> I don't need to, I'll just go through my notification history hehe
<LnL> gchristensen: you happen to know the semantics of amqp heartbeats?
<LnL> one thing I noticed that looks slightly suspicious is that a build which publishes a lot of logs seems to receive inconsistent heartbeats
<LnL> so I'm wondering if it's possible for the channel to get congested so the heartbeat can't get through in time
<cole-h> Look at the log message collector logs
<cole-h> "WARN:amqp::session: Error dispatching packet to channel 1: Full! Blocking until there is space."
<cole-h> Maybe related?
<gchristensen> "Any traffic (e.g. protocol operations, published messages, acknowledgements) counts for a valid heartbeat. Clients may choose to send heartbeat frames regardless of whether there was any other traffic on the connection but some only do it when necessary."
<LnL> ok so it's not a mandatory separate thign
<gchristensen> note the log collector does not do many protocl operations
<cole-h> OK, so probably unrelated x)
<LnL> but it sure looked like that staging died while sending logs each time
<LnL> I did also get at least one ConnectionAborted
<LnL> this might be nice to get a bit more rabbitmq specific data https://gist.github.com/LnL7/7666af126198abad8959f74f4274dadb
<cole-h> Oh, using the node exporter we already have for systemd stuff, huh? Cool.
<cole-h> Out of curiosity, do you have an example query that you might use?
<LnL> rabbitmq_connection_received_bytes
<LnL> contains a bunch of good stuff
<LnL> and since our clients / queues are relatively static there shouldn't be a dimensionality problem for any of these
<LnL> also other interesting stuff like number of ack vs nack builds
<cole-h> Definitely SGTM.
<cole-h> Especially if you get around to making a dashboard for that... ;^)
<LnL> needs a user, etc. to talk to rabbitmq so it's a bit more difficult to setup
<gchristensen> hrm probably should gc or something on thes emachines
<cole-h> Or optimise-store, if that isn't automated already?
<cole-h> Looks like there's ~free space, but not free inodes
<gchristensen> it isn't
<cole-h> I mean obviously gc'ing would be good too
<LnL> I started a gc on eval-2
<cole-h> Yeah, I see that thing skyrocket in inodes + disk free
<LnL> gchristensen: is it intentional I can only access one of the hosts?
<cole-h> Yeah, so you can only bring down at most 1 machine >:)
<LnL> lol
<LnL> I'm sure I can do better then that
<gchristensen> no :)
<LnL> hmm, this staging build has been running for 2h
<LnL> has build-timeout become a trusted option or something?
<gchristensen> I don't think so
<LnL> yeah no
<LnL> maybe nix changed to count per build rather then absolute?
<LnL> also do you want to take a look at the tracing stuff or is that good to go
<gchristensen> lgtm :)
<cole-h> Merging, then
<{^_^}> [ofborg] @cole-h merged pull request #480 → tracing logging → https://git.io/Jf3Iz
<{^_^}> [ofborg] @cole-h pushed 11 commits to released: https://git.io/JfGlg
<LnL> yay
<cole-h> Next is infra#16, but I've heard that LnL can merge that himself ;^)
<gchristensen> w00t!
<LnL> ah right
<LnL> very nice :D
<LnL> https://monitoring.nix.ci/explore?orgId=1&left=%5B%22now-15m%22,%22now%22,%22Loki%22,%7B%22expr%22:%22%7Bpr%3D%5C%2286848%5C%22%7D%22%7D,%7B%22mode%22:%22Logs%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D
<cole-h> Hot
<cole-h> Next thing I would like to see is if it's possible to set up coloring based on the `level` field
<cole-h> Right now, warning, info, etc all show up as gray
<LnL> oh hmm that should be working
<cole-h> Look at the "description is too long" messages -- level WARN, but grey
<cole-h> (Like how I used gray in one sentence and then grey in the next? :D)
<LnL> hmm don't find anything with that
<LnL> {unit=~"ofborg.*service"} |~ "description too long"
<LnL> oh you mean the nix-instantiate output?
MichaelRaskin has quit [Ping timeout: 256 seconds]
<cole-h> Yeah, those lines
<cole-h> (sorry I should have written it literally)
<cole-h> "description is over 140 char"
<cole-h> Also gchristensen (or LnL if he's given you access to the other machines): if you could optimise or gc the eval nodes, that would be great :) I'm getting red x's because "no space left on device" lol
<cole-h> nixpkgs#86488
<{^_^}> https://github.com/NixOS/nixpkgs/pull/86488 (by cole-h, 3 days ago, open): nixos/doas: init
<cole-h> (Probably spot-eval-1 first -- that has 0 inodes lol)
<LnL> yeah weird none of the log levels are recognised
<LnL> ^ they are blue/green/red over here
<cole-h> Interesting.
<cole-h> ty for the gc/optimise
<LnL> only the output of commands doesn't have a log level, but that's not structured so to be expected
<cole-h> Yeah
<LnL> -/nix/store/jbw12lhrwv55nymsjpc5jl485ijmv6df-grafana-loki-1.1.0-bin/bin/promtail
<LnL> +/nix/store/238rz17hj87grmfg6pzjm4alymk11fz5-grafana-loki-1.3.0-bin/bin/promtail
<LnL> I bet that fixes itself when we update the loki service
<cole-h> OK, cool
<LnL> wonder why I have a different verison tho
<LnL> I didn't upgrade yet
MichaelRaskin has joined #nixos-borg
<LnL> also re garbage collect, I noticed some indirect result roots getting freed which seems kind of unexpected
<gchristensen> packet-spot-eval-1...........> 37150 store paths deleted, 222653.01 MiB freed
<gchristensen> packet-spot-eval-3...........> 43941 store paths deleted, 227193.83 MiB freed
<gchristensen> packet-spot-eval-1...........> deleting '/nix/store/pn18yzimrbqjw2bsb2wsl7ja5i3ag1xy-loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong.drv'
<LnL> euhm...
<infinisil> Hehe
<{^_^}> nix#3542 (by mkenigs, 1 week ago, merged): Set GCROOT to store path to prevent garbage collection
<infinisil> I think that's my doing with https://github.com/NixOS/nixpkgs/pull/83241
<{^_^}> nixpkgs#83241 (by Infinisil, 6 weeks ago, merged): lib/strings: Add `sanitizeDerivationName` function
<LnL> has somebody been bad?
<infinisil> The loo..oong is by me I mean
<infinisil> not a problem though, it's just long :)
<LnL> hmm no that's not the one I was thinking of
<infinisil> (thought so)
<{^_^}> nix#3541 (by alyssais, 1 week ago, merged): Fix long paths permanently breaking GC
<cole-h> Well, considering spot-eval-1 is back up to ~90%, I say we're OK for now :)
<cole-h> omg
<cole-h> LnL++
<{^_^}> LnL's karma got increased to 49
<cole-h> LnL++
<{^_^}> LnL's karma got increased to 50
<LnL> hm?
<cole-h> https://i.imgur.com/ApAZVB9.png Useless backtraces are now grouped with the message that generated it
<cole-h> Now I don't have to wonder if the error is benign or not -- it's literally right there
<LnL> nice, didn't even realise that :)
<LnL> you can also ask loki fo rintext around a log line btw
<LnL> context*
<cole-h> Yeah, but it usually doesn't go far enough when there's like 3 backtraces rolled into one
<cole-h> It only lets me do ~20 lines both directions