gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
cole-h has quit [Quit: Goodbye]
hmpffff has joined #nixos-borg
orivej has joined #nixos-borg
hmpffff_ has joined #nixos-borg
hmpffff has quit [Ping timeout: 272 seconds]
hmpffff has joined #nixos-borg
hmpffff_ has quit [Ping timeout: 240 seconds]
<qyliss> Do these logs load for anybody? https://logs.nix.ci/?key=nixos/nixpkgs.85731&attempt_id=6f5c3608-2c60-41c6-8a7e-dfac49339ceb
<gchristensen> qyliss: sorry qyliss, looks like the log collector died. I restarted it and thinsg should be collected now
<qyliss> Does that mean I should build again?
<gchristensen> unfortunately, yeah, I'm sorry
<qyliss> np
<qyliss> <3 gchristensen
<{^_^}> gchristensen's karma got increased to 274
<gchristensen> there is an ugly problem where some of the workers can half crash, and the crashed thread doesn't take down the whole process. I bet there is a way to fix that ...
<LnL> I did some basic testing with lapin and that seems to reconnect/die properly at first glance
<gchristensen> nice
<gchristensen> lets RiiL:)
<LnL> but I bet adding a panic in the right place would also fix it
<gchristensen> I think we need to add a panic handler
<LnL> ah if a thread panics it doesn't bring everything down?
<gchristensen> unfortunately not
<LnL> right
<LnL> something something supervisors
<gchristensen> I think I have a hacky fix to propagate the failure, but it is a bit annoying to test :)
<gchristensen> iptables -t filter -I INPUT -s 147.75.199.209 -j DROP
<LnL> couldn't the heartbeat close the session/channels somehow?
hmpffff has quit [Quit: Bye…]
cole-h has joined #nixos-borg
<LnL> there's a sender for the session and each channel which means that dropping the one in the heartbeat doesn't close the receiver
<gchristensen> yeah
<gchristensen> so I experimented with a patch to do this ... let me push it
<gchristensen> look at the die-on-heartbeat branch
<LnL> yeah, not very pretty but I think that would do the trick
<cole-h> "let sender = send_sender" heh
<LnL> might be better to reverse the condition and check for TryRecvError::Disconnected
<gchristensen> that would work for me :)
<LnL> currently the heartbeat panics so the tombstone message might not get sent
<gchristensen> I think I fixed the panic by deleting an unwrap and replacing it with a return
<LnL> ah, missed that was mostly looking at the last commit
<cole-h> Do we (I) have any way to know when the log collector dies? (re: qyl*ss's question earlier) Seems like it's been happening a decent amount recently (could just be my new-ness having not noticed before though)
<gchristensen> you can see it manifest as a panic in the logs, on the core-0 machine
<gchristensen> this branch I showed LnL is maybe going to fi xit
<gchristensen> LnL: think I should merge and update ofborg to try it?
<LnL> yeah sounds good
<LnL> assuming the logcollector thing is the same as what I've seen
<gchristensen> yeah I think it is
<gchristensen> cole-h: want to update the dependencies on ofborg, do the carnix thing, and send a PR?
<cole-h> Oh, is this the SendError(..) thing we're dealing with here?
<gchristensen> yeah
<cole-h> And sure
<gchristensen> cool
<cole-h> At the end of it all, at the very least I'll know how to bump dependencies in ofborg! :D
<gchristensen> :)
<{^_^}> [ofborg] @cole-h opened pull request #464 → Bump amqp → https://git.io/JfkAT
<cole-h> Uh, nice, 6 (local) test failures
<cole-h> Oh, maybe because of that matching line stuff again
<cole-h> Oh, it was because I have `experimental-features` in my nix.conf and I was running it with `nix`
<cole-h> Heh
<cole-h> btw, checkPhase timed out on fetching again...
<cole-h> I broke travis by force-pushing, thinking it would make it rerun like it does with ofborg... RIP.
<LnL> hm, doesn't that work?
<cole-h> If it does, it isn't right now :(
<LnL> I tried out github actions for nix-darwin a while back, pretty easy to switch but can't really say if it's more stable
<LnL> gchristensen: have you had a chance to take a look at the hydra export yet?
<gchristensen> yeah I got pretty far and then got stumped on some annoyances w.r.t. mounting devices
<gchristensen> can continue tonight :)
<LnL> ah, cool
<cole-h> btw gchristensen if you could manually trigger CI on ofborg#464 , that would be swell ^^
<{^_^}> https://github.com/NixOS/ofborg/pull/464 (by cole-h, 1 hour ago, open): Bump amqp
<gchristensen> huh
<gchristensen> seems travis is just bad
<gchristensen> apparently travis is trying hard to push everybody off of it
<cole-h> I tried force-pushing to restart the checkPhase (because HTTP timed out... lol) and it just died :D
<LnL> oh the build status is just gone
<gchristensen> lol CI passed but it just never wrote statuses
<gchristensen> lol travis is so broken
<cole-h> :D
* gchristensen puts on a tinfoil hat
<gchristensen> you know,
<gchristensen> breaking all the other CI systems is a great way to get people on to GitHub Actions
<cole-h> That actually doesn't sound too farfetched...
<srk> interesting
<MichaelRaskin> Hmmm.
<MichaelRaskin> Should we add a GitHub action that runs nixpkgs-review on every commit?
<cole-h> Assuming you mean against nixpkgs PRs/commits
<MichaelRaskin> Yes
<LnL> heh
<MichaelRaskin> That's like the last thing ofborg doesn't dare do for Nixpkgs QA yet!
<gchristensen> heh
<LnL> world-class CI/CD
<LnL> sounds great, here's 80k builds
<cole-h> Heh
<MichaelRaskin> Make the world-class CI «make world»-class!
<MichaelRaskin> Actually… hmm… how much has to be done by people with GH org admin access?
<gchristensen> its no good
<gchristensen> we only get like 2k minutes of time
<gchristensen> unless we can do something meaningful in 1min20s per PR
<cole-h> Most of that would probably be taken up by getting the nix binaries available using the cachix GH action, or whatever
<MichaelRaskin> Ah
<{^_^}> [ofborg] @grahamc pushed 2 commits to released: https://git.io/JfIJK
<{^_^}> [ofborg] @grahamc merged pull request #464 → Bump amqp → https://git.io/JfkAT
<gchristensen> pushing out
<gchristensen> ehh this might not be good
<LnL> something not happy?
<gchristensen> yeaha lot of panics and reconnects going on https://monitoring.nix.ci/explore?orgId=1&left=%5B%22now-30m%22,%22now%22,%22Loki%22,%7B%22expr%22:%22%7Bjob%3D%5C%22systemd-journal%5C%22%7D%22%7D,%7B%22mode%22:%22Logs%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D
<LnL> ah there's the thing, couldn't find it
<LnL> hmm isn't that the php part?
<gchristensen> oh there are some of those , ignore those
<gchristensen> hmm maybe it has settled down
<LnL> looks like a few aarch builders disappeared for some reason
<LnL> "Error consuming IoError(UnexpectedEof)"
<LnL> looks like stopping just doesn't handle stuff nicely so it freaks out a bit
<gchristensen> aye
<LnL> searching with {unit=~"ofborg-.*service"} really helps :)
<gchristensen> :D
<cole-h> Just got back. Did the change not work as expected?
{`-`} has joined #nixos-borg