{^_^} has quit [Remote host closed the connection]
gchristensen has quit [Quit: WeeChat 2.2]
grahamc has joined #nixos-borg
grahamc is now known as gchristensen
gchristensen is now known as {^_^}
{^_^} is now known as gchristensen
{^_^} has joined #nixos-borg
<infinisil>
Hmm, I have a bunch of these lines in my logs:
<infinisil>
Apr 23 16:05:26 protos nixbot[26406]: Exception: ConnectionClosedException Abnormal "Could not connect to any of the provided brokers: [((\"events.nix.gsc.io\",5671),HostCannotConnect \"events.nix.gsc.io\" [Network.Socket.connect: <socket: 3>: does not exist (Connection refused),Network.Socket.connect: <socket: 3>: does not exist (Connection refused)])]"
<gchristensen>
what TZ are thos e logs in?
<infinisil>
That was about 5:30 hours ago
<infinisil>
> CEST
<gchristensen>
yeah
<{^_^}>
"The time in CEST is currently 21:39:51 (UTC +2)"
<gchristensen>
I have CEST in my task bar :)
<gchristensen>
since so many of my NixOS friends are in CEST
<infinisil>
Hehe neta
<infinisil>
neat
<gchristensen>
so around 1600 CEST I rebooted the events.nix.gsc.io host
<infinisil>
Oh I see
<gchristensen>
(uptime of 5:10hrs)
<infinisil>
Hmm so how should I handle that
<infinisil>
Exponential backoff? Or just retry every minute or so
<infinisil>
How long was it down?
<gchristensen>
every minute is fine
<gchristensen>
hmm just a few minutes I think
<gchristensen>
let's find out
<infinisil>
It might be only 15 seconds
<infinisil>
Apr 23 16:05:37 protos nixbot[26406]: AMQP connection opened
<gchristensen>
it halted at 14:29:50 and the kernel started again at 14:30:24 (UTC)
<gchristensen>
the ofborg workers just die and relies on systemd restarting them, which it does every 30s
<infinisil>
> UTC
<{^_^}>
"The time in UTC is currently 19:44:12 (UTC 0)"
<infinisil>
Hmm
<infinisil>
My logs say it opened a connection again at 16:05, but it never opened a channel, which it's supposed to do right after opening a connection
<infinisil>
I guess I should add some timeouts
<infinisil>
To retry
<infinisil>
Or something
<gchristensen>
ofborg's model is if amqp gets weird, fail the process
<gchristensen>
and let systemd take care of respawn
<infinisil>
Hmm I implemented exponential backoff with a maximum of 1 minute, along with a 5 second timeout for trying to connect