<qyliss>
Do these logs load for anybody? https://logs.nix.ci/?key=nixos/nixpkgs.85731&attempt_id=6f5c3608-2c60-41c6-8a7e-dfac49339ceb
<gchristensen>
qyliss: sorry qyliss, looks like the log collector died. I restarted it and thinsg should be collected now
<qyliss>
Does that mean I should build again?
<gchristensen>
unfortunately, yeah, I'm sorry
<qyliss>
np
<qyliss>
<3 gchristensen
<{^_^}>
gchristensen's karma got increased to 274
<gchristensen>
there is an ugly problem where some of the workers can half crash, and the crashed thread doesn't take down the whole process. I bet there is a way to fix that ...
<LnL>
I did some basic testing with lapin and that seems to reconnect/die properly at first glance
<gchristensen>
nice
<gchristensen>
lets RiiL:)
<LnL>
but I bet adding a panic in the right place would also fix it
<gchristensen>
I think we need to add a panic handler
<LnL>
there's a sender for the session and each channel which means that dropping the one in the heartbeat doesn't close the receiver
<gchristensen>
yeah
<gchristensen>
so I experimented with a patch to do this ... let me push it
<gchristensen>
look at the die-on-heartbeat branch
<LnL>
yeah, not very pretty but I think that would do the trick
<cole-h>
"let sender = send_sender" heh
<LnL>
might be better to reverse the condition and check for TryRecvError::Disconnected
<gchristensen>
that would work for me :)
<LnL>
currently the heartbeat panics so the tombstone message might not get sent
<gchristensen>
I think I fixed the panic by deleting an unwrap and replacing it with a return
<LnL>
ah, missed that was mostly looking at the last commit
<cole-h>
Do we (I) have any way to know when the log collector dies? (re: qyl*ss's question earlier) Seems like it's been happening a decent amount recently (could just be my new-ness having not noticed before though)
<gchristensen>
you can see it manifest as a panic in the logs, on the core-0 machine
<gchristensen>
this branch I showed LnL is maybe going to fi xit
<gchristensen>
LnL: think I should merge and update ofborg to try it?
<LnL>
yeah sounds good
<LnL>
assuming the logcollector thing is the same as what I've seen
<gchristensen>
yeah I think it is
<gchristensen>
cole-h: want to update the dependencies on ofborg, do the carnix thing, and send a PR?
<cole-h>
Oh, is this the SendError(..) thing we're dealing with here?
<gchristensen>
yeah
<cole-h>
And sure
<gchristensen>
cool
<cole-h>
At the end of it all, at the very least I'll know how to bump dependencies in ofborg! :D
<gchristensen>
yeaha lot of panics and reconnects going on https://monitoring.nix.ci/explore?orgId=1&left=%5B%22now-30m%22,%22now%22,%22Loki%22,%7B%22expr%22:%22%7Bjob%3D%5C%22systemd-journal%5C%22%7D%22%7D,%7B%22mode%22:%22Logs%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D
<LnL>
ah there's the thing, couldn't find it
<LnL>
hmm isn't that the php part?
<gchristensen>
oh there are some of those , ignore those
<gchristensen>
hmm maybe it has settled down
<LnL>
looks like a few aarch builders disappeared for some reason
<LnL>
"Error consuming IoError(UnexpectedEof)"
<LnL>
looks like stopping just doesn't handle stuff nicely so it freaks out a bit
<gchristensen>
aye
<LnL>
searching with {unit=~"ofborg-.*service"} really helps :)
<gchristensen>
:D
<cole-h>
Just got back. Did the change not work as expected?