#nixos-borg on 2020-04-25

2018-04-19 20:36 gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg

00:06 <cole-h> RIP darwin builder?

00:38 <cole-h> Darwin builder has returned.

00:38 <cole-h> However, I think ofborg-evaluator died

00:41 <gchristensen> it is probably not dead, but suffering under the mass rebuild in that PR

00:43 <cole-h> Really? There're no status icons on any of the new PRs and no logs since 1600 pacific

00:43 <gchristensen> oh hrm

00:43 <gchristensen> :/

00:43 <gchristensen> I'll log in and look

00:47 <gchristensen> cole-h: https://monitoring.nix.ci/d/000000002/ofborg?orgId=1&refresh=10s

00:47 <gchristensen> no queued evals means there is a problem before the ofborg-evaluator worker itself

00:48 <cole-h> Hm. Like what?

00:48 <gchristensen> cole-h: in this case, ofborg-evaluation-filter.service has a crash

00:49 <cole-h> Ah, I had looked at it, but the fact that I saw "Hello, world!" and following lines of logging, I figured it had recovered

00:49 <gchristensen> interesting

00:49 <gchristensen> it'd be good t omonitor the size of the queue that reads from

00:50 <cole-h> Oh yeah, look at all those juicy logs

00:50 <cole-h> Do you mean a dashboard in grafana, or is this something that should be done in code?

00:52 <gchristensen> probably both needed

00:52 <gchristensen> not sure exactly

00:54 <cole-h> Being unfamiliar with AMQP, I wonder how one would retrieve queue size

00:56 <gchristensen> how does the ofborg evals graph work? I'd start there :)

00:57 <cole-h> It hooks into a mysterious metric called "ofborg_queue_evaluator_in_progress"

00:58 <cole-h> (and _waiting)

00:58 <cole-h> To ofborg/infra I go

00:58 <gchristensen> indeed!

00:59 <cole-h> Oh no it's PHP

01:01 <gchristensen> a little bit of PHP is good for the soul

01:01 <gchristensen> in a sort of homeopathic kind of way

01:01 <cole-h> lol

01:11 <cole-h> I'm thinking it would be interesting to look at `$queue['messages']`, but it also seems like it would fail in exactly the same way because that is used to determine the in_progress stuff

01:13 <gchristensen> lol this is overly fancy php too, the worst

01:15 <cole-h> Where is the json that $queues and $connections read generated?

01:15 <gchristensen> getting that for you

01:18 <gchristensen> queues.json http://ix.io/2jqD

01:18 <gchristensen> http://ix.io/2jqE connecitons

01:20 <cole-h> Thanks. Time to do some reading.

01:43 <samueldr> gchristensen: you mean non-existent?

01:43 <gchristensen> haha

01:43 <samueldr> this is a peeve of mine

01:43 <gchristensen> I mean in a slightly poisonous way, but in such low quantity to not hurt you

01:43 <samueldr> describing homeopathic as having anything in any context is dangerous! it's poison for your mind!

01:44 <samueldr> if there's detectable traces of it in it it's not homeopathy anymore

01:44 <gchristensen> oh :)

01:44 <samueldr> (and it happens, and it's dangerous!)

01:45 <cole-h> `send_pend` from connections looks interesting... "Send queue size." according to rabbitqmctl(8). There's also the various stuff in `message_stats` from queues

01:47 <gchristensen> https://pulse.mozilla.org/api/ anything in this API I can get you the results :)

01:48 orivej has quit [Ping timeout: 256 seconds]

01:48 <cole-h> The main thing I'm interested in (for comparison's sake) is what connections.json and queues.json looks like when the eval filter is down like it was before

01:49 <cole-h> So I can determine what things are just a red herring or not

01:49 <gchristensen> samueldr: you're completely correct

01:49 <gchristensen> ah

01:49 <samueldr> I generally a ;)

01:49 <samueldr> uh

01:49 <gchristensen> haha

01:49 <samueldr> I generally am* ;)

01:50 <samueldr> at least that's what I've led myself to believe

03:01 <cole-h> Hm, I wonder if queues' `messages_unacknowledged` would have filled up in the case that the eval filter crashed...

03:03 <gchristensen> in that case I think they'd just be waiting

03:04 <gchristensen> unack'd if the worker received it but hasn't replied

03:04 <gchristensen> I think

03:04 <gchristensen> g'night :)

03:04 <cole-h> o/

06:42 cole-h has quit [Ping timeout: 264 seconds]

08:31 orivej has joined #nixos-borg

08:42 <LnL> welp, I ran carnix but didn't actually test the output since it moved

11:08 orivej has quit [Ping timeout: 260 seconds]

12:29 <LnL> gchristensen: what's the release = false; for?

12:33 <gchristensen> no idea ;)

12:33 <gchristensen> :x

12:34 <LnL> "Build release=false", I see that yes :D

12:35 <LnL> actually maybe that's why I could see nice backtraces in gdb

12:36 <gchristensen> ohh

12:36 <gchristensen> yeah I think it turns off the optimisations

12:40 <LnL> yeah, I was wondering why but it should be useful for debugging connection problems, etc.

12:41 <{^_^}> [ofborg] @LnL7 opened pull request #471 → carnix workspace → https://git.io/JftkD

12:42 <LnL> ^ alternatively I could also revert the workspaces for now

12:43 <LnL> not sure how important the includes are

12:46 <gchristensen> if it builds I assume it is fine :P

12:47 <LnL> it should, but 96639e9 is a bit of a hack to fix the generation

12:49 <{^_^}> [ofborg] @grahamc merged pull request #471 → carnix workspace → https://git.io/JftkD

12:49 <{^_^}> [ofborg] @grahamc pushed 5 commits to released: https://git.io/JftIJ

12:49 <gchristensen> want to deploy? :)

12:49 <LnL> I wish I could push this second patch to carnix tho, the nest doesn't seem to be very happy

12:50 <gchristensen> :/

13:03 orivej has joined #nixos-borg

14:15 <LnL> hmm, the build queue dropped to zero with the deploy

15:11 <gchristensen> still weird?

15:12 <LnL> stuff is building so probably a coincidence

15:13 <LnL> was also darwin which nothing on that side should be consuming

15:13 <gchristensen> yeah

15:14 <LnL> idea: publish a master eval and hello build or something after a deploy

15:15 <LnL> that way something kick off immediately instead of waiting for somebody to make a pr

15:16 <gchristensen> that sounds like a great idea :)

15:23 andi- has quit [Ping timeout: 246 seconds]

16:10 cole-h has joined #nixos-borg

16:35 <cole-h> Anybody have thoughts on this comment? https://github.com/NixOS/nixpkgs/pull/85951#issuecomment-619358840 I can't say for sure, but I'm leaning towards "no," because wouldn't the coreutils bump have seen the same issue? Or is binutils larger than coreutils?

16:36 <LnL> that's the build thing I linked yesterday

16:37 <cole-h> Oh right

16:39 <LnL> but I don't think it's "failing", but rather only builders handle timeout messages since the evaluators are not really supposed to be building stuff

16:40 <LnL> publishing a build request for the libs would be better but that might be a bit tricky to do, not sure

16:52 <LnL> cole-h: actually that returns jobs to be scheduled so shouldn't be all that hard

18:25 <cole-h> GitHub are you kidding me

18:25 <cole-h> "thread 'main' panicked at 'Failed to add labels ["ofborg-internal-error"] to issue #86014: Error(Http(Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })),"

18:25 <{^_^}> https://github.com/NixOS/nixpkgs/pull/86014 (by marsam, 10 minutes ago, open): awsweeper: 0.6.0 -> 0.7.0

18:29 <cole-h> I guess GitHub really doesn't like marsam lol. #86016 got caught as well

18:29 <{^_^}> https://github.com/NixOS/nixpkgs/pull/86016 (by marsam, 12 minutes ago, open): cloud-nuke: 0.1.7 -> 0.1.18

20:11 <infinisil> gchristensen: For https://github.com/NixOS/nixpkgs/pull/85951, I think I have a pretty good solution: I have a nixpkgs patch to allow overriding the pkgs used for the lib tests, so they can be run with `nix-build lib/tests/release.nix --arg pkgs 'import <nixpkgs> {}'`, meaning it doesn't depend on the commits nixpkgs version anymore for stdenv and such

20:11 <{^_^}> #85951 (by lovesegfault, 1 day ago, open): binutils: 2.31.1 -> 2.34

20:13 <infinisil> The only requirement is that ofborg needs a nixpkgs in NIX_PATH. How does that sound?

20:15 <gchristensen> it doesn't have one, on purpose -- to prevent nixpkgsfrom importing nixpkgs

20:15 <gchristensen> that is tricky :/

20:15 <gchristensen> probably should add the idea of "mandatory builds"

20:15 <infinisil> Hmm

20:16 <infinisil> I mean it actually doesn't matter whether it's <nixpkgs> or something else

20:16 <infinisil> Could even be `--arg pkgs 'import (fetchTarball "channel:nixpkgs-unstable") {}'`

20:16 <infinisil> Or `--arg pkgs 'import <nixpkgs-impure-borg> {}'`

20:18 <infinisil> Or `--arg pkgs 'import /path/to/nixpkgs {}'`

20:19 <infinisil> I think for the lib tests something like this would work great, because these don't test pkgs itself, they only need it to support the testing

20:27 <infinisil> I'll look into supporting something like that

21:00 <cole-h> GitHub seriously, please stop. Getting more internal errors due to timeouts... gchristensen do you know if timeout limits are configured in ofborg, or someplace else (i.e. is it something that can be dealt with)?

21:38 <tilpner> cole-h: Can you determine what the current timeout is? (90s?)

21:43 <cole-h> The only configurable timeouts I spy is the one for builds, and the one for the rabbitmq connection, while I believe this timeout is borg <-> GitHub, not borg <-> queue(s)

21:48 <{^_^}> [ofborg] @Infinisil opened pull request #472 → Pass build nixpkgs to mass-rebuilder binary → https://git.io/Jft8t

21:48 <infinisil> Done the thing :) ^

21:53 <tilpner> cole-h: No, I meant "can you calculate the timeout that's being used from the logs, or otherwise observe it?"

21:54 <tilpner> Some libraries do allow for setting a timeout

21:54 <tilpner> But it might as well be a server-side timeout, in which case there's nothing you can do

21:54 <tilpner> If it's 90s, there's a chance it's client-side

21:57 <cole-h> Some of these aren't timeouts, which leads me to believe it's actually GH again... borg merges the commit, and then 18 seconds later, it errors at "Failed to get issue: end of stream before headers finished"