gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
cole-h has quit [Quit: Goodbye]
cole-h has joined #nixos-borg
cole-h has quit [Quit: Goodbye]
orivej has quit [Ping timeout: 256 seconds]
<{^_^}> [ofborg] @LnL7 opened pull request #479 → remove static lifetimes from easylapin → https://git.io/JfOdx
orivej has joined #nixos-borg
cole-h has joined #nixos-borg
<{^_^}> [ofborg] @LnL7 opened pull request #480 → tracing logging → https://git.io/Jf3Iz
<gchristensen> LnL: when you mentioned the thing about deploying yesterday, I stopped deploying
<gchristensen> I'm happy to merge these PRs, but I'd like to deploy them around the same time -- let me know what I should do :)
<LnL> yeah, the debugging stuff I mentioned doesn't have to block that
cust0dian has joined #nixos-borg
<cole-h> I like how the switch to tracing appears to be mostly changing the imports from log -> tracing
<LnL> but I think you'll have to at least reboot the aarch server soon
<LnL> I don't have enough permissions to look at the stuck builders or restart them
<LnL> cole-h: yeah, it's really implemented as a superset in a lot of ways
<cust0dian> borg seems to have hung on my PR: https://github.com/NixOS/nixpkgs/pull/86334 — is this a minor issue and I should just push something else there to trigger another run or do I need to raise an issue?
<{^_^}> nixpkgs#86334 (by cust0dian, 2 days ago, open): tmuxinator: 1.1.4 -> 2.0.0
<LnL> one thing that's neat that I already noticed is that it also handles logging of libraries (like lapin) by default
<cole-h> cust0dian: Usually it's fine to just `@ofborg eval` and restart the eval. I think I know the issue but haven't been able to find time to actually look into fixing it.
<cust0dian> gotcha, thanks!
<cole-h> Unless you see a big purple label that says `ofborg-internal-error` -- then, come find me.
<LnL> why is clippy yelling at me? :/
<cole-h> LnL++ Thanks for adding a picture -- I was gonna ask for one :P
<{^_^}> LnL's karma got increased to 46
<cole-h> LnL: I think clippy wants you to split `head_waiter`'s closure to a separate function and `thread::spawn()` that
<cole-h> idk about the others
<cole-h> I smile every time I see "lmao I got a job?" show up in the logs
<gchristensen> lol
<cole-h> LnL: jk, splitting that closure did nothing
<LnL> urgh yeah
<cole-h> LnL: It appears to be related to the `info!` macro. Commenting out lines 134-5 gets rid of that one error
<LnL> btw I don't get these with latest
<cole-h> Latest what, clippy?
<LnL> oh, sounds like the heuristic might be counting the expanded macro then
<cole-h> Yep, just found that too
<LnL> yeah
<LnL> I generally work in a ~nixpkgs-unstable shell
<cole-h> LnL: What are your clippy, cargo, and rustc versions when this doesn't happen?
<cole-h> 0.0.212, 1.43.0, and 1.43.0, respectively?
<LnL> 1.42
<LnL> also posted some output of the json formatter
<cole-h> Now the question is how it looks in Loki
<LnL> yeah I don't know how that integrates
<cole-h> It might be useful to remove the date "segment", since Loki does that automagically
<LnL> btw, I really don't have much of an opionion on the library
<LnL> if either of you think slog or whatever is the better option I can also try that out
<cole-h> <3
<LnL> cole-h: there's a without_time so that looks straightforward if we don't want it
<gchristensen> we can easily reboot, btw, several people in -aarch64 can reboot
<gchristensen> btw please feel free to merge and deploy without me :)
<LnL> alright, no reason not to then
<cole-h> I don't really have an opinion one way or the other re: slog vs tracing.
<cole-h> Tracing looks good for now, so I say stick with it, assuming gchristensen doesn't have another opinion.
<gchristensen> makeitso.bmp
<LnL> the span is a bit magical since it threads the context through the entire call stack and not just the local scope
<cole-h> btw, what's the reason we use openssl 1.0.2u? I gathered it's because of one of our libraries, I think, but not much more than that
<LnL> hmm another travis failure
<gchristensen> newer versions dropped something the the old amqp libraryused
<LnL> oh!
<cole-h> Then, maybe we can drop openssl 1.0.2u now that we use lapin? :o
<LnL> so that might also resolve with switching?
<gchristensen> yea
<LnL> ok let's deploy the qos change now then and see if my builder gets stuck again
<gchristensen> I'm here if you need me :)
<cole-h> LnL: "We will watch your career with great interest..."
<cole-h> Wow, getting fancy -- naming your deploy
<LnL> the default names are kind of useless since it's the infra repo
<cole-h> Yeah :P
<LnL> gchristensen: btw one thing I've noticed is that nginx seems to restart every time
<gchristensen> ...huh
<LnL> that couldn't be related to the drops we're seeing right?
<gchristensen> I wouldn't think so
<LnL> it's a different port so...
<LnL> oh, mine restarted this time
<LnL> builder 56462 ofborg 5u IPv4 0x821242dabf05bf6b 0t0 TCP>core-0.ewr1.nix.ci:5671 (ESTABLISHED)
<LnL> so that's a good sign, alltho none of the aarch builders disappeared either
<cole-h> What's that InvalidChannelState error?
<LnL> something happened to the rabbitmq cannel, the queue getting emptied out is probably related
<LnL> my guess is either rabbitmq is restarting _somehow_ or one of the services forcibly recreates queues somehow
<LnL> oh, also see this a few times before the successful restart
<LnL> Error: IOError(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })
<cole-h> Any context?
<LnL> yeah, this is when connecting to a garabage host https://gist.github.com/LnL7/a3fa6ffd1b1f766a2dd41158f9afffab
<cole-h> Oh. Probably the aarch64 one(s)?
<LnL> that's what is causing them to stall yes
<LnL> even more interesting!
<LnL> Error: IOError(Custom { kind: Other, error: Ssl(Error { code: ErrorCode(5), cause: Some(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })) }, X509VerifyResult { code: 0, error: "ok" }) })
<cole-h> Huh
<LnL> nixops won't know about that
<cole-h> Is that why the queue gets dropped???
<cole-h> Well, that's certainly one way to do it...
<LnL> if this actually happens then yes
<LnL> at least until https://github.com/NixOS/ofborg/pull/478 that is
<{^_^}> #478 (by LnL7, 1 day ago, merged): make messages persistent
<cole-h> Right -- so it should no longer happen, but it was/might have been because of that
<LnL> having it just restart otherwise shouldn't be a big deal
<LnL> assuming hearbeats, etc. get properly handled by the clients, which I'm hoping lapin will mostly do
<cole-h> Fingers crossed.
<cole-h> LnL: Thanks for all you do for ofborg (and the rest of the Nix ecosystem) :)
<cole-h> Some day I'll have my own borg builder to test these kinds of things on, but for now I'm just doing relatively trivial changes.
<LnL> well it's been a while since I did anything for it :)
<LnL> also hugging people in pyjamas isn't really socially accepted at the moment so I have more free time
<cole-h> Hahaha
<cole-h> # of version bumps I've done: 0; # of version bumps LnL has done: 1
<cole-h> Doing only slightly more than me ;)
<LnL> heh
<gchristensen> LnL: wtf that is terrible!
<gchristensen> lol!
<LnL> :D
<LnL> mind checking if the uptime of rabbitmq is <3h
<LnL> oh hold on, I think I can do that now
<cole-h> 2020-05-01 20:47:20.875 [info] <0.9734.9> RabbitMQ is asked to stop...
<LnL> yeah... pretty confident that's it then :)
<gchristensen> Active: active (running) since Fri 2020-05-01 20:47:41 UTC; 1h 12min ago
<cole-h> lol
<cole-h> "Gee, I wonder why rabbitmq is dropping its queue
<cole-h> "
<gchristensen> so, those messages should definitely be perisstent
<gchristensen> except not log messages
<LnL> yeah, regardless of this not loosing stuff when a restart or reboot is needed is much better
<LnL> still don't get why logs didn't dissapear tho
NinjaTrappeur has quit [*.net *.split]
qyliss has quit [*.net *.split]
{^_^} has quit [*.net *.split]
<gchristensen> yeah
qyliss has joined #nixos-borg
<LnL> how does that part actually work?
<gchristensen> persisting to disk?
<LnL> oh! it writes files
<gchristensen> oh are you looking via the log viewer?
<LnL> no wonder it's persistent
<LnL> I thought the log viewer talked to amqp directly and history just rotated because it's a bounded queue
<gchristensen> ah
<gchristensen> log viewer does talk to amqp directly, too :)
<gchristensen> it first connects to rabbitmq, then tries to fetch history off disk
<qyliss> a/go
<qyliss> aaaa I keep doing that
<LnL> ok that makes sense then
<LnL> so log messages where not persistent either, anything published but not persisted yet would disappear
<LnL> that's just a really small window
<gchristensen> yeah
<gchristensen> and they should not be persisted, either
<gchristensen> log messages can easily be 1k+/s
<LnL> ah, could you double check that then
<gchristensen> all in-memory
<LnL> hmm so the logs are not durable which means the client might be sending delivery_mode 2 but that's being ignored?
<gchristensen> hmm
<gchristensen> not sure
<gchristensen> the queue created by the receivers is not durable
<gchristensen> so the messages can't be persistent
<gchristensen> because a disconnect is instant death
<gchristensen> I've been on hold 30 minutes to buy a pizza
<LnL> the exchange is durable tho
<gchristensen> yeah but the exchange doens't hold anything
<LnL> whoa
<gchristensen> the exchange is just a map
<LnL> so that doesn't really do anything?
<infinisil> {^_^} is gone :(
<LnL> since 20:47 by any chance?
<infinisil> Not sure what timezone you're in, but {^_^} left this channel about 41 minutes ago
<LnL> that's utc, but sounds like a different thing then
<gchristensen> there was a netsplit infinisil
<gchristensen> oh
<gchristensen> also
<gchristensen> it needed restarting :)
{^_^} has joined #nixos-borg
<infinisil> ahh