gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
hmpffff_ has joined #nixos-borg
hmpffff has quit [Ping timeout: 244 seconds]
orivej has quit [Ping timeout: 264 seconds]
<cole-h> Something seems wrong but I can't tell what from a brief glance at the logs
<cole-h> Oh hey, comment poster service is dead
<cole-h> Ping gchristensen -- no `ofborg-github-comment-poster.service` logs in the past long time, last logs are a backtrace.
<cole-h> I think `ofborg-stats.service` is dead too, but I don't remember what it does.
orivej has joined #nixos-borg
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #nixos-borg
cole-h has quit [Quit: Goodbye]
<LnL> my builder also restarted a bunch
<LnL> hasn't been stuck yet since beta4
hmpffff has joined #nixos-borg
hmpffff_ has quit [Ping timeout: 260 seconds]
tilpner has joined #nixos-borg
tilpner_ has joined #nixos-borg
tilpner has quit [Read error: Connection reset by peer]
tilpner_ is now known as tilpner
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-borg
hmpffff_ has joined #nixos-borg
hmpffff has quit [Ping timeout: 256 seconds]
hmpffff has joined #nixos-borg
hmpffff_ has quit [Ping timeout: 240 seconds]
hmpffff_ has joined #nixos-borg
hmpffff has quit [Ping timeout: 246 seconds]
tilpner_ has joined #nixos-borg
tilpner has quit [Ping timeout: 246 seconds]
tilpner_ is now known as tilpner
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #nixos-borg
<LnL> hmmm
<LnL> gchristensen: ping
<gchristensen> finishing a meeting, I'll be here in like 10min. what's up?
<LnL> half of the ofborg stuff died
<gchristensen> oof
cole-h has joined #nixos-borg
<LnL> https://monitoring.nix.ci/explore?orgId=1&left=%5B%22now-3h%22,%22now%22,%22PromLoki%22,%7B%22expr%22:%22sum(count_over_time(%7Bunit%3D~%5C%22ofborg-.*.service%5C%22%7D%5B10m%5D))%20by(unit)%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D
<cole-h> RIP x86_64-linux builders
<LnL> yeah you can see the heartbeats stop around that time
<LnL> maybe I should push the update
<cole-h> Which update?
<LnL> lapin: beta3 -> beta4
<cole-h> Ah, got it. :shipit: tbh
<gchristensen> yikes
<gchristensen> all the linux builders on Packet got culled
<cole-h> Oof
<LnL> oh! I thought about that but assumed more would be gone in that case
<cole-h> Crikey
<LnL> whut
<gchristensen> oh cool
<gchristensen> they're following some feedback I gave them a while back
<LnL> it's back down
<cole-h> Which is to cull in-use machines? :D
<LnL> what's that?
<gchristensen> if they were doing maintenance which would break my spot servers for anything more than a few seconds, don't email me to say so -- just make the price expensive so I can't buy them
<cole-h> Ohhhhh. That's smart.
<LnL> heh
* cole-h wishes he had a borg dashboard for when that happens
<LnL> so.... they will come back automagically?
<gchristensen> yeah, I'm updating the provision URL and the next deploy will automatically recreate them
<cole-h> Woo! :)
<LnL> ah so small intervention
<gchristensen> the URL update won't be needed in a couple weeks, once the new images are updated with Packet
<LnL> yeah figured, was referring to the deploy part
<gchristensen> yeah
<gchristensen> ehhh it may not work correctly actually
<gchristensen> it might not realize the machines don't exist with Packet in a dry-activation deploy
<cole-h> Also gchristensen++ for trying to push the new doc RFC closer
<{^_^}> gchristensen's karma got increased to 290
<gchristensen> well, let's see if it does work. I'll give it a few minutes.
<cole-h> RIP.
orivej has quit [Ping timeout: 240 seconds]
orivej_ has joined #nixos-borg
<cole-h> "75% done" lol
<gchristensen> the way this machine is created, there are 4 steps and the first 3 are very fast and the 4th takes a While
<gchristensen> lol
<gchristensen> hmmm
<cole-h> oof
<cole-h> More connection refused's :(
<gchristensen> none of these are too surprising to me tbh
<gchristensen> that is the first time the error was surprising.
<cole-h> What does `--check --allow-recreate` do?
<gchristensen> --check tells nixops to confirm with the API that those machines actually exist (if not, error unless --allow-rcreate is passed -- in which case, recreate them)
* LnL is back
<cole-h> o/
<LnL> what did I miss?
<cole-h> Just watching Graham fiddle with stuff to see if we can get it to deploy (:
<gchristensen> grub-install: error: cannot find a GRUB drive for /dev/disk/by-id/ata-MICRON_M510DC_MTFDDAK480MBP_160511B060AD. Check your device.map.
<gchristensen> I'm not understanding why this is happening
<cole-h> Time to switch to systemd-boot? :D
<LnL> oh, the actual deploy failed no something nixops related
<cole-h> But it really is weird...
orivej_ has quit [Ping timeout: 260 seconds]
<LnL> gchristensen: should we try to rollback to the the nixpkgs from before the recent update?
MichaelRaskin has quit [Read error: Connection reset by peer]
<{^_^}> ofborg/infrastructure#18 (by LnL7, 5 days ago, merged): nixpkgs: db3e832 -> a7ceb25
<gchristensen> nah
<gchristensen> that isn't related
<gchristensen> this is a bug in the installer image I used, but I honestly don't understand what caused it
<gchristensen> I'll take a look shortly
MichaelRaskin has joined #nixos-borg
<gchristensen> sorry, my life is video calls these days
<cole-h> All good. That one time you sent a screenie of your schedule for the day, I realized how little free time you actually have :P
<gchristensen> heh
<gchristensen> oaeunshanoset I think I found a bug lol
* cole-h slams F5 on infra
<gchristensen> yeaaap thats a bug
<gchristensen> --check --allow-recreate on the Packet plugin for nixops doesn't, uh
<gchristensen> work
<gchristensen> it never updated the metadata about the machine
<cole-h> o
<gchristensen> so it did recreate the machine but didn't bother to update the metadata on like, what disk hold sgrub
<cole-h> Hahaha
<{^_^}> input-output-hk/nixops-packet#19 (by grahamc, 20 seconds ago, open): nixops deploy --check --allow-recreate doesn't update the machine's metadata
<{^_^}> input-output-hk/nixops-packet#18 (by grahamc, 1 minute ago, open): Destroying a server with a bad API key deletes locally but doesn't destroy the machine
<cole-h> RIP.
<cole-h> 18 seems especially ouch-worthy.
<cole-h> Gave you those fancy thumbs-ups
<gchristensen> :)
<gchristensen> tbh those are to-do's for me
<{^_^}> [ofborg] @LnL7 opened pull request #483 → lapin: 1.0.0-beta3 -> 1.0.0-beta4 → https://git.io/JflPH
<cole-h> So this is where all of LnL's additions and deletions come from
<cole-h> Updating dependencies
<cole-h> :D
<LnL> lol
<LnL> don't think I ever even ran carnix before this :p
<cole-h> Yeah, suuuuure.
<LnL> github doesn't believe that either since I moved it
<gchristensen> :P
<gchristensen> phew okay it is up
<LnL> \o/
<gchristensen> fwiw I ran nixops destroy --include packet-spot-eval-{1,2,3} and then did the deploy again
<gchristensen> this deleted the stale metadata and created it again
<LnL> right, anything we could do in case this happens again?
<gchristensen> good question
<cole-h> I don't think so lol. Only Graham has the keys to the kingdom, as it were (the actual machines to run that deploy to).
<cole-h> Maybe there could be a buildkite job that has like 10 steps of unblocking in order to run that command :P
<LnL> also sounds like a scary thing to run, wouldn't want to delete core-0 by accident
<cole-h> Right.
<cole-h> gchristensen: These dev updates are always fun to read. Thanks for writing them up.
<LnL> I guess the obvious answer is fix the bugs? :D
<gchristensen> I'm glad you like them :)
<gchristensen> LnL: yup lol
<gchristensen> the real answer is probably get y'all enough creds to be able to run nixops commands?
<LnL> not sure, being restricted is annoying but also kind of forces you to do the right thing
<gchristensen> yeah
<cole-h> Ah, how nice it is to see eval logs again :D
<LnL> are the stats still not running?
<cole-h> Hm, indeed
<cole-h> At least, no logging for it
<LnL> that's one nice thing about the heartbeat spam
<gchristensen> should probably kill the stats thing anyway, its metrics are garbage
<cole-h> And thus begins the slow declines of available aarch builders anew :D
<cole-h> (as with every deploy hehe)
<cole-h> Seems like GitHub has mostly gotten their crap together: 1 week without (internal error) incident!
<cole-h> (for now)
* cole-h knocks on wood.
<gchristensen> <3
<cole-h> I was talking about that big, purple, scary label that occasionally get slapped on PRs when GitHub sends incomplete API responses
<LnL> ah
<LnL> any thoughts on what to do with lapin now?
<gchristensen> keep going?:)
<LnL> not entirely confident it's completely stable yet, the aarch builder is probably the most visible to confirm
<LnL> alternatively yeah, I can convert the next thing
<cole-h> Well, if we find any issues, we can file a bug :)
<LnL> true, project seems pretty active
<cole-h> Yeah -- 179 closed issues, and 0 open
<cole-h> Man, if only it was AMPQ and not AMQP. I had a great idea for an "AMPQ" library
<cole-h> "ampq-ity" in the vein of "ambiguity" :D
<LnL> :)
<cole-h> I guess "amqp-ity" in the vein of piety wouldn't be too un-punny, though...
<LnL> urgh how many times am I going to have to rebuild openssl
<gchristensen> a bit of a stretch :P
<cole-h> LnL: Man, if you don't take care of it, the first thing I'll do after we completely lapin-ify borg is dropping openssl 1.0.2u
<LnL> oh right, that's also related
<cole-h> And then I'll be able to test-build borg without being in a nix-shell/lorri being active :D
<LnL> you can build stuff outside of a nix shell?
<cole-h> I mean `cargo build`
<LnL> The program 'cargo' is currently not installed. It is provided by
<cole-h> You forget I am still on Arch :D
<LnL> what!
<cole-h> `which cargo` => /home/vin/.cargo/bin/cargo
<LnL> :p
<cole-h> And obviously 1.0.2u is outdated, so build fails without a shell providing 1.0.2u
<LnL> gchristensen: what would you suggests as the next one?
<gchristensen> hmm
<gchristensen> maybe comment parser, what do you think?
<LnL> sounds good
<cole-h> +1
<{^_^}> [ofborg] @LnL7 opened pull request #484 → lapin github-comment-filter → https://git.io/JflHE
<cole-h> LnL: Oooh, look at that: net-negative by one line! :D
<LnL> this one was pretty straightforward
<cole-h> gchristensen: Considering GH Actions is free for public repos (I think the cost/"minutes" was something you had reservations about in the past), would you be open to a PR adding GHA?
<cole-h> While I'm in the mindset of Actions thanks to me fixing lorri's Actions
<cole-h> That darn cognitive complexity lint again...
<LnL> if you don't know what to do, a rust update would be nice :)
<cole-h> LnL: You sure that'll solve the problem? 1.43.0, clippy still picks up that cognitive complexity
<LnL> huh
<cole-h> However, it's happy with it in nightly clippy 0.0.212 (28197b6 2020-04-29)
<gchristensen> by all means
<cole-h> LnL: Maybe next stable -- beta also doesn't complain about complexity.
<LnL> I have nothing nightly related tho
<cole-h> You don't get the complexity warnings with `cargo clippy`?
<cole-h> Hmmmm.... I think eval filter is dead.
<cole-h> Ping gchristensen
<cole-h> A worrying amount of PRs without check runs https://i.imgur.com/LQ5yaSP.png
<cole-h> ofborg-evaluation-filter.service ofborg-github-comment-filter.service ofborg-github-comment-poster.service ofborg-stats.service <- all look dead
<LnL> ah I see, my editor is building stuff in the background first so the warning doesn't show up anymore
<cole-h> Your editor: "mmmmm tasty errors"
<LnL> you could deploy a change to restart stuff :)
<gchristensen> restarted
<cole-h> LnL: You know what, you're right...
<cole-h> gchristensen: Under what kind of situations should we NOT re-deploy? Is there ever a reason not to, if it would fix this kind of thing?
<LnL> out of disk space wouldn't get you very far with this
<cole-h> :P
<LnL> btw, if we can't detect this problem I thin this would help https://gist.github.com/LnL7/7666af126198abad8959f74f4274dadb
<LnL> presumably you can see one of the queues growing
<cole-h> Hooooooooolyyyyyyyyyy
<cole-h> 52 waiting :D
<LnL> gchristensen: ^ I can't really and that myself because of the secret
<LnL> it needs a
<LnL> user with "monitoring" permissons
<cole-h> wot
<cole-h> LnL: Since you changed the comment filter's main to return a Result, I think all the `unwrap`s should become `?`s
<gchristensen> oh cool LnL
<gchristensen> that is a great idea
<gchristensen> services.ofborg.rabbitmq.monitoring_{username,password} use these values LnL
<cole-h> Oh wow, I completely missed the gist, so I was really confused with what LnL was talking about :D