<cole-h>
Oof, all the evaluators and builders died?
<cole-h>
If none of the packet machines are back up by 11PM PDT (~30m), I'll redeploy to see if that helps...
<cole-h>
I'm impatient and anxious, so I'm moving it up 15m and doing it now.
<cole-h>
...that doesn't seem good. Multi-minute "waiting for agent" times. buildkite r u OK? (according to https://buildkitestatus.com it's fine...)
<cole-h>
Something's wrong with buildkite, methinks.
<cole-h>
packet-nix-builder has been waiting for an agent for almost 6 hours, packet-spot-buildkite-agent for almost 14 hours, and there's even a r13y job that has been waiting for almost 24 hours.
<cole-h>
gchristensen: It's up to you, now. I don't think there's anything else I can do from here.
cole-h has quit [Quit: Goodbye]
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-borg
<gchristensen>
uh oh
<LnL>
it's the deploy host that's down I think
<gchristensen>
I don't understand what is happening here
<gchristensen>
fetching my fork of nixops-packet which has that rev as the tip
<gchristensen>
I think there was a regression in nix but I hacked it.
orivej has quit [Ping timeout: 265 seconds]
cole-h has joined #nixos-borg
<cole-h>
OK, what's up? This whole thing has me real confused.
<gchristensen>
no clue
<cole-h>
tbh I was tempted to email buildkite last night seeing if it's not just us... But the fact their status page has nothing up makes me think maybe it is.
<gchristensen>
it was on my end and now it is on maybe packet's end
<cole-h>
What was the problem on your end?
<gchristensen>
my deploy host rebooted and needed some secrets uploaded
<cole-h>
oh lol
<cole-h>
Is that what all those failures were on the other jobs came from, as well?
<gchristensen>
...maybe?
<gchristensen>
I'm not sure it isn't working yet
<cole-h>
packet :-(
<cole-h>
Could the issues be related to the new ipxe? Didn't you say you were uploading that some time ago?
<gchristensen>
no
<cole-h>
Harrumph. Packet claims to be fine, yet our machines still aren't up.
<cole-h>
idk what to do. If I can help in any way, let me know.
orivej has joined #nixos-borg
orivej has quit [Ping timeout: 256 seconds]
orivej_ has joined #nixos-borg
<gchristensen>
I'm about to have some time to look closer
<gchristensen>
a bunch of things had to be done this morning
<cole-h>
<3 gchristensen
<{^_^}>
gchristensen's karma got increased to 328
orivej_ has quit [Ping timeout: 246 seconds]
<gchristensen>
(1 "this morning "later)
<cole-h>
:P
<cole-h>
gchristensen: I think the packet.net link is completely broken. It times out for me with or without https, but the previous IP works fine.
<cole-h>
(Tested just by `curl`ing the ipxe)
<cole-h>
:O I see stuff happening!
<cole-h>
:OOOO
<gchristensen>
yayyy
<cole-h>
What did you do, you magician?
<gchristensen>
I changed https to http
<gchristensen>
lol
<cole-h>
But how did that work when I can't curl the http link? :o
<cole-h>
"[RESOLVED] StalledEvaluator ..." 🙏🙏🙏
<cole-h>
Except 1 and 3 both failed to install bootloaders
<cole-h>
I was too hasty in my 🙏
<gchristensen>
hmm
<gchristensen>
so it didn't work .........
<cole-h>
It got farther, at least...
<gchristensen>
oneuhnsaoheunoatehu spot-eval-2 is causing problems
<cole-h>
wat how
<cole-h>
Even though it got the furthest?
<cole-h>
Interesting how it's eval-2 that always goes first...
<cole-h>
Or at least appears to always be going first
<gchristensen>
eval-1 and eval-3 are pretty much ready to go, but nixops won't set them up until eval-2 is ready :x
<cole-h>
>:(
<cole-h>
eval-2 WHY
<cole-h>
omg
* cole-h
is cautiously optimistic
<gchristensen>
yaaaaay
<cole-h>
✨ gchristensen
<{^_^}>
gchristensen's karma got increased to 329
<cole-h>
✨ gchristensen
<{^_^}>
gchristensen's karma got increased to 330
<cole-h>
✨ gchristensen
<{^_^}>
gchristensen's karma got increased to 331
<cole-h>
Now I can do the thing again...
<cole-h>
"[RESOLVED] StalledEvaluator ..." 🙏🙏🙏
<gchristensen>
w000t
<gchristensen>
:)
<cole-h>
Any indication as to what caused this? Extremely weird situation...