gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
<cole-h> RIP darwin builder?
<cole-h> Darwin builder has returned.
<cole-h> However, I think ofborg-evaluator died
<gchristensen> it is probably not dead, but suffering under the mass rebuild in that PR
<cole-h> Really? There're no status icons on any of the new PRs and no logs since 1600 pacific
<gchristensen> oh hrm
<gchristensen> :/
<gchristensen> I'll log in and look
<gchristensen> no queued evals means there is a problem before the ofborg-evaluator worker itself
<cole-h> Hm. Like what?
<gchristensen> cole-h: in this case, ofborg-evaluation-filter.service has a crash
<cole-h> Ah, I had looked at it, but the fact that I saw "Hello, world!" and following lines of logging, I figured it had recovered
<gchristensen> interesting
<gchristensen> it'd be good t omonitor the size of the queue that reads from
<cole-h> Oh yeah, look at all those juicy logs
<cole-h> Do you mean a dashboard in grafana, or is this something that should be done in code?
<gchristensen> probably both needed
<gchristensen> not sure exactly
<cole-h> Being unfamiliar with AMQP, I wonder how one would retrieve queue size
<gchristensen> how does the ofborg evals graph work? I'd start there :)
<cole-h> It hooks into a mysterious metric called "ofborg_queue_evaluator_in_progress"
<cole-h> (and _waiting)
<cole-h> To ofborg/infra I go
<gchristensen> indeed!
<cole-h> Oh no it's PHP
<gchristensen> a little bit of PHP is good for the soul
<gchristensen> in a sort of homeopathic kind of way
<cole-h> lol
<cole-h> I'm thinking it would be interesting to look at `$queue['messages']`, but it also seems like it would fail in exactly the same way because that is used to determine the in_progress stuff
<gchristensen> lol this is overly fancy php too, the worst
<cole-h> Where is the json that $queues and $connections read generated?
<gchristensen> getting that for you
<gchristensen> queues.json http://ix.io/2jqD
<gchristensen> http://ix.io/2jqE connecitons
<cole-h> Thanks. Time to do some reading.
<samueldr> gchristensen: you mean non-existent?
<gchristensen> haha
<samueldr> this is a peeve of mine
<gchristensen> I mean in a slightly poisonous way, but in such low quantity to not hurt you
<samueldr> describing homeopathic as having anything in any context is dangerous! it's poison for your mind!
<samueldr> if there's detectable traces of it in it it's not homeopathy anymore
<gchristensen> oh :)
<samueldr> (and it happens, and it's dangerous!)
<cole-h> `send_pend` from connections looks interesting... "Send queue size." according to rabbitqmctl(8). There's also the various stuff in `message_stats` from queues
<gchristensen> https://pulse.mozilla.org/api/ anything in this API I can get you the results :)
orivej has quit [Ping timeout: 256 seconds]
<cole-h> The main thing I'm interested in (for comparison's sake) is what connections.json and queues.json looks like when the eval filter is down like it was before
<cole-h> So I can determine what things are just a red herring or not
<gchristensen> samueldr: you're completely correct
<gchristensen> ah
<samueldr> I generally a ;)
<samueldr> uh
<gchristensen> haha
<samueldr> I generally am* ;)
<samueldr> at least that's what I've led myself to believe
<cole-h> Hm, I wonder if queues' `messages_unacknowledged` would have filled up in the case that the eval filter crashed...
<gchristensen> in that case I think they'd just be waiting
<gchristensen> unack'd if the worker received it but hasn't replied
<gchristensen> I think
<gchristensen> g'night :)
<cole-h> o/
cole-h has quit [Ping timeout: 264 seconds]
orivej has joined #nixos-borg
<LnL> welp, I ran carnix but didn't actually test the output since it moved
orivej has quit [Ping timeout: 260 seconds]
<LnL> gchristensen: what's the release = false; for?
<gchristensen> no idea ;)
<gchristensen> :x
<LnL> "Build release=false", I see that yes :D
<LnL> actually maybe that's why I could see nice backtraces in gdb
<gchristensen> ohh
<gchristensen> yeah I think it turns off the optimisations
<LnL> yeah, I was wondering why but it should be useful for debugging connection problems, etc.
<{^_^}> [ofborg] @LnL7 opened pull request #471 → carnix workspace → https://git.io/JftkD
<LnL> ^ alternatively I could also revert the workspaces for now
<LnL> not sure how important the includes are
<gchristensen> if it builds I assume it is fine :P
<LnL> it should, but 96639e9 is a bit of a hack to fix the generation
<{^_^}> [ofborg] @grahamc merged pull request #471 → carnix workspace → https://git.io/JftkD
<{^_^}> [ofborg] @grahamc pushed 5 commits to released: https://git.io/JftIJ
<gchristensen> want to deploy? :)
<LnL> I wish I could push this second patch to carnix tho, the nest doesn't seem to be very happy
<gchristensen> :/
orivej has joined #nixos-borg
<LnL> hmm, the build queue dropped to zero with the deploy
<gchristensen> still weird?
<LnL> stuff is building so probably a coincidence
<LnL> was also darwin which nothing on that side should be consuming
<gchristensen> yeah
<LnL> idea: publish a master eval and hello build or something after a deploy
<LnL> that way something kick off immediately instead of waiting for somebody to make a pr
<gchristensen> that sounds like a great idea :)
andi- has quit [Ping timeout: 246 seconds]
cole-h has joined #nixos-borg
<cole-h> Anybody have thoughts on this comment? https://github.com/NixOS/nixpkgs/pull/85951#issuecomment-619358840 I can't say for sure, but I'm leaning towards "no," because wouldn't the coreutils bump have seen the same issue? Or is binutils larger than coreutils?
<LnL> that's the build thing I linked yesterday
<cole-h> Oh right
<LnL> but I don't think it's "failing", but rather only builders handle timeout messages since the evaluators are not really supposed to be building stuff
<LnL> publishing a build request for the libs would be better but that might be a bit tricky to do, not sure
<LnL> cole-h: actually that returns jobs to be scheduled so shouldn't be all that hard
<cole-h> GitHub are you kidding me
<cole-h> "thread 'main' panicked at 'Failed to add labels ["ofborg-internal-error"] to issue #86014: Error(Http(Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })),"
<{^_^}> https://github.com/NixOS/nixpkgs/pull/86014 (by marsam, 10 minutes ago, open): awsweeper: 0.6.0 -> 0.7.0
<cole-h> I guess GitHub really doesn't like marsam lol. #86016 got caught as well
<{^_^}> https://github.com/NixOS/nixpkgs/pull/86016 (by marsam, 12 minutes ago, open): cloud-nuke: 0.1.7 -> 0.1.18
<infinisil> gchristensen: For https://github.com/NixOS/nixpkgs/pull/85951, I think I have a pretty good solution: I have a nixpkgs patch to allow overriding the pkgs used for the lib tests, so they can be run with `nix-build lib/tests/release.nix --arg pkgs 'import <nixpkgs> {}'`, meaning it doesn't depend on the commits nixpkgs version anymore for stdenv and such
<{^_^}> #85951 (by lovesegfault, 1 day ago, open): binutils: 2.31.1 -> 2.34
<infinisil> The only requirement is that ofborg needs a nixpkgs in NIX_PATH. How does that sound?
<gchristensen> it doesn't have one, on purpose -- to prevent nixpkgsfrom importing nixpkgs
<gchristensen> that is tricky :/
<gchristensen> probably should add the idea of "mandatory builds"
<infinisil> Hmm
<infinisil> I mean it actually doesn't matter whether it's <nixpkgs> or something else
<infinisil> Could even be `--arg pkgs 'import (fetchTarball "channel:nixpkgs-unstable") {}'`
<infinisil> Or `--arg pkgs 'import <nixpkgs-impure-borg> {}'`
<infinisil> Or `--arg pkgs 'import /path/to/nixpkgs {}'`
<infinisil> I think for the lib tests something like this would work great, because these don't test pkgs itself, they only need it to support the testing
<infinisil> I'll look into supporting something like that
<cole-h> GitHub seriously, please stop. Getting more internal errors due to timeouts... gchristensen do you know if timeout limits are configured in ofborg, or someplace else (i.e. is it something that can be dealt with)?
<tilpner> cole-h: Can you determine what the current timeout is? (90s?)
<cole-h> The only configurable timeouts I spy is the one for builds, and the one for the rabbitmq connection, while I believe this timeout is borg <-> GitHub, not borg <-> queue(s)
<{^_^}> [ofborg] @Infinisil opened pull request #472 → Pass build nixpkgs to mass-rebuilder binary → https://git.io/Jft8t
<infinisil> Done the thing :) ^
<tilpner> cole-h: No, I meant "can you calculate the timeout that's being used from the logs, or otherwise observe it?"
<tilpner> Some libraries do allow for setting a timeout
<tilpner> But it might as well be a server-side timeout, in which case there's nothing you can do
<tilpner> If it's 90s, there's a chance it's client-side
<cole-h> Some of these aren't timeouts, which leads me to believe it's actually GH again... borg merges the commit, and then 18 seconds later, it errors at "Failed to get issue: end of stream before headers finished"