gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
qyliss has quit [Quit: bye]
qyliss has joined #nixos-borg
cole-h has quit [Quit: Goodbye]
cole-h has joined #nixos-borg
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-borg
orivej has quit [Ping timeout: 258 seconds]
cole-h has quit [Quit: Goodbye]
orivej has joined #nixos-borg
orivej has quit [Ping timeout: 256 seconds]
superherointj has joined #nixos-borg
orivej has joined #nixos-borg
cole-h has joined #nixos-borg
orivej has quit [Ping timeout: 272 seconds]
Mic92 has joined #nixos-borg
<Mic92> gchri
<Mic92> gchristensen: what is your plan regarding ofborg in long term? Do you think it would be good add more maintainer to the project?
<cole-h> Both LnL and I are maintainers, but we (or at least I) don't have the know-how to fix the current issues with our deploy setup.
<LnL> yeah, the deploy host was the problem IIRC
<LnL> kind of forgot what exactly at this point
<cole-h> The problem is that there is some dangling commit gumming up the works, somewhere
<cole-h> I think it was something like this issue: https://github.com/NixOS/nix/issues/2431
<{^_^}> nix#2431 (by sgraf812, 2 years ago, open): fetchGit fails with a not very helpful error message when fetching a revision not in the remote's HEAD
<cole-h> "fatal: not a tree object: d807cdc0879b03b9c4da4d8db9fbce77e65d3496" is the problem.
<LnL> ah yeah remember now the buildkite checkout fails
<cole-h> The worst part is: I/we don't know which repo is failing to checkout
<LnL> wait fetchGit?
<LnL> isn't it the buildkite checkout itself?
<cole-h> I don't think so, because it fails in the ./build/build.sh step (meaning ofborg/infra checked out successfully)
<LnL> oh! well we can probably do something in that case
<LnL> give me a second
<cole-h> Oh?
* cole-h looks at the 29 attempts to restart the deploy since August 10th....
<LnL> ok so when I saw the error I had the impression it was in the buildkite runner
<LnL> but if it's the build step then we presumably can reproduce it right?
<cole-h> Maybe. But it might very well be a stateful issue with the runner.
<cole-h> (I think it'd be kinda hard to reproduce because of nixops state)
<LnL> fatal: not a tree object: d807cdc0879b03b9c4da4d8db9fbce77e65d3496
<cole-h> Oh
<LnL> aha, it's the build of the nixops, etc. environment
<cole-h> Nice
<cole-h> LnL++ x10
<{^_^}> LnL's karma got increased to like 81, I think
<cole-h> omg
<cole-h> I swear I grepped for that hash when the issue first popped up
<LnL> unless somebody else has access to that it shouldn't have changed
<cole-h> It still exists
<cole-h> ???
<LnL> and part of master, wut
<cole-h> poetry r u ok
<LnL> it's not poetry itself, there's nix stuff that parses the lockfile
<LnL> and since it's a git dependency that probably uses builtins.fetchGit
<cole-h> What if you change the ref to the branch name?
<cole-h> Maybe needs to be refs/....
<LnL> yeah problem is that it's not generated
<cole-h> LnL: Maybe try setting the rev for nixops-packet in the pyproject.toml and updating the lockfile with `poetry lock`?
<LnL> we don't have a hetzner machine anymore right?
<cole-h> Correct
<LnL> man, this stuff uses too many custom branches...
<cole-h> Indeed (:
<{^_^}> ofborg/infrastructure#36 (by LnL7, 21 seconds ago, open): maybe deploy workaround?
<LnL> the terraform config still references hetzner stuff however so no idea if that's enough
<LnL> but at least that builds
<LnL> wdyt?
<cole-h> One moment, class ends in ~3 minutes :P
<LnL> oh, that's still a thing? ;)
<cole-h> Hehe. Zoom U, unfortunately.
<cole-h> LnL: Rebuilds don't fetch new deps (e.g. you just rebuilt with the old infra revision)
<cole-h> Yep, you figured it out :P
<LnL> yeah, that makes sense :)
<cole-h> LnL++ x100000000000000000000000
<{^_^}> LnL's karma got increased to 82
<cole-h> I swear on me mum I looked for that hash in the repo...
<cole-h> Now we just have to hope packet isn't being temperamental today (:
<LnL> I didn't but when I looked at the log I had the impression this was in the "Preparing working directory" part for some reason
<cole-h> Heh
<cole-h> We balanced eachother out, then
<LnL> probably because it didn't get to the ofborg build
<LnL> it's failing to connect but I think we could drop it from the config an re-create now
<cole-h> It's not failing to connect
<cole-h> Packet takes a while to provision NixOS machines
* cole-h has personal experience with this, since yesterday
<cole-h> Well, it is failing to connect, but because the machine is still provisioning/setting up.
<LnL> hmm isn't there output on what will be created/removed first?
<cole-h> It might be because the machine/prometheus stuff died, so the host isn't being recreated in nixops' eyes?
<cole-h> Or rather, the ofborg stuff died
<LnL> yeah there's a difference between a machine that's missing and an existing machine that's not reachable
<cole-h> btw LnL, when you have a sec: thoughts on https://github.com/NixOS/ofborg/pull/527? Now that we can redeploy ofborg, I think that should finally be rebased to `released` :P
<{^_^}> #527 (by Mic92, 6 weeks ago, open): ofborg: post a finished check if evaluation starts
<cole-h> (I personally plan on waiting until Thursday or Saturday to merge+deploy, so I can babysit it when I have free time on those days)
<LnL> well, I'm kind of opposed to mixing multiple CI systems but I don't want to make the decision something on that
<cole-h> +1.
NinjaTrappeur has quit [Quit: WeeChat 2.8]
NinjaTrappeur has joined #nixos-borg
<LnL> I'd personally say if it's something that touches the status api it shouldn't be introduced unless it goes through an rfc
<LnL> even if it's just a simple linter
<cole-h> I figured we were past that point (since it's already been introduced to Nixpkgs), but I agree.
<LnL> the stale issue bot is a good example of something similar
<LnL> don't get me wrong, not having separate (managed) infrastructure would be great, but adding in more stuff is the other direction
<cole-h> You've convinced me, and I totally agree.
<cole-h> OK, yeah, maybe we should redeploy to 1 host and then back to 3. Seems like the machine is a SSH tar pit...
<LnL> I'm going to bed, but yeah seems like it should be unblocked now
<cole-h> o/ See you later.
<cole-h> Thanks for digging into the issue and realizing how simple a fix it was :D
<LnL> guess we should have looker properly sooner
<cole-h> s/we/cole-h/
<cole-h> I even spent a few days looking into it, and gave up :D
superherointj_ has joined #nixos-borg
superherointj_ has quit [Client Quit]
superherointj has quit [Quit: Leaving]