#nixos-borg on 2020-09-21

2018-04-19 20:36 gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg

01:03 qyliss has quit [Quit: bye]

01:14 qyliss has joined #nixos-borg

03:03 cole-h has quit [Quit: Goodbye]

03:04 cole-h has joined #nixos-borg

03:19 orivej has quit [Ping timeout: 260 seconds]

03:42 orivej has joined #nixos-borg

04:08 orivej has quit [Ping timeout: 258 seconds]

06:52 cole-h has quit [Quit: Goodbye]

08:27 orivej has joined #nixos-borg

12:17 orivej has quit [Ping timeout: 256 seconds]

12:25 superherointj has joined #nixos-borg

14:26 orivej has joined #nixos-borg

15:47 cole-h has joined #nixos-borg

16:33 orivej has quit [Ping timeout: 272 seconds]

19:03 Mic92 has joined #nixos-borg

19:03 <Mic92> gchri

19:04 <Mic92> gchristensen: what is your plan regarding ofborg in long term? Do you think it would be good add more maintainer to the project?

19:18 <cole-h> Both LnL and I are maintainers, but we (or at least I) don't have the know-how to fix the current issues with our deploy setup.

19:23 <LnL> yeah, the deploy host was the problem IIRC

19:23 <LnL> kind of forgot what exactly at this point

19:24 <cole-h> The problem is that there is some dangling commit gumming up the works, somewhere

19:24 <cole-h> I think it was something like this issue: https://github.com/NixOS/nix/issues/2431

19:24 <{^_^}> nix#2431 (by sgraf812, 2 years ago, open): fetchGit fails with a not very helpful error message when fetching a revision not in the remote's HEAD

19:26 <cole-h> "fatal: not a tree object: d807cdc0879b03b9c4da4d8db9fbce77e65d3496" is the problem.

19:26 <LnL> ah yeah remember now the buildkite checkout fails

19:26 <cole-h> The worst part is: I/we don't know which repo is failing to checkout

19:27 <LnL> wait fetchGit?

19:27 <LnL> isn't it the buildkite checkout itself?

19:28 <cole-h> I don't think so, because it fails in the ./build/build.sh step (meaning ofborg/infra checked out successfully)

19:29 <LnL> oh! well we can probably do something in that case

19:29 <LnL> give me a second

19:29 <cole-h> Oh?

19:31 * cole-h looks at the 29 attempts to restart the deploy since August 10th....

19:35 <LnL> ok so when I saw the error I had the impression it was in the buildkite runner

19:36 <LnL> but if it's the build step then we presumably can reproduce it right?

19:37 <cole-h> Maybe. But it might very well be a stateful issue with the runner.

19:38 <cole-h> (I think it'd be kinda hard to reproduce because of nixops state)

19:39 <LnL> fatal: not a tree object: d807cdc0879b03b9c4da4d8db9fbce77e65d3496

19:40 <cole-h> Oh

19:40 <LnL> aha, it's the build of the nixops, etc. environment

19:40 <cole-h> Nice

19:41 <cole-h> LnL++ x10

19:41 <{^_^}> LnL's karma got increased to like 81, I think

19:42 <LnL> hmm that's unexpected https://github.com/ofborg/infrastructure/blob/master/nix/poetry/poetry.lock#L118-L121

19:42 <cole-h> omg

19:43 <cole-h> I swear I grepped for that hash when the issue first popped up

19:43 <LnL> unless somebody else has access to that it shouldn't have changed

19:44 <cole-h> It still exists

19:44 <cole-h> https://github.com/grahamc/nixops-packet/commit/d807cdc0879b03b9c4da4d8db9fbce77e65d3496

19:44 <cole-h> ???

19:44 <LnL> and part of master, wut

19:45 <cole-h> poetry r u ok

19:46 <LnL> it's not poetry itself, there's nix stuff that parses the lockfile

19:46 <LnL> and since it's a git dependency that probably uses builtins.fetchGit

19:46 <cole-h> What if you change the ref to the branch name?

19:47 <cole-h> Maybe needs to be refs/....

19:48 <LnL> yeah problem is that it's not generated

19:58 <cole-h> LnL: Maybe try setting the rev for nixops-packet in the pyproject.toml and updating the lockfile with `poetry lock`?

19:58 <LnL> we don't have a hetzner machine anymore right?

19:59 <cole-h> Correct

20:00 <LnL> man, this stuff uses too many custom branches...

20:00 <cole-h> Indeed (:

20:07 <LnL> https://github.com/ofborg/infrastructure/pull/36

20:07 <{^_^}> ofborg/infrastructure#36 (by LnL7, 21 seconds ago, open): maybe deploy workaround?

20:08 <LnL> the terraform config still references hetzner stuff however so no idea if that's enough

20:08 <LnL> but at least that builds

20:11 <LnL> wdyt?

20:12 <cole-h> One moment, class ends in ~3 minutes :P

20:13 <LnL> oh, that's still a thing? ;)

20:13 <cole-h> Hehe. Zoom U, unfortunately.

20:20 <cole-h> LnL: Rebuilds don't fetch new deps (e.g. you just rebuilt with the old infra revision)

20:20 <cole-h> Yep, you figured it out :P

20:21 <LnL> yeah, that makes sense :)

20:21 <cole-h> LnL++ x100000000000000000000000

20:21 <{^_^}> LnL's karma got increased to 82

20:21 <cole-h> I swear on me mum I looked for that hash in the repo...

20:22 <cole-h> Now we just have to hope packet isn't being temperamental today (:

20:22 <LnL> I didn't but when I looked at the log I had the impression this was in the "Preparing working directory" part for some reason

20:23 <cole-h> Heh

20:23 <cole-h> We balanced eachother out, then

20:23 <LnL> probably because it didn't get to the ofborg build

20:26 <LnL> it's failing to connect but I think we could drop it from the config an re-create now

20:27 <cole-h> It's not failing to connect

20:27 <cole-h> Packet takes a while to provision NixOS machines

20:27 * cole-h has personal experience with this, since yesterday

20:27 <cole-h> Well, it is failing to connect, but because the machine is still provisioning/setting up.

20:28 <LnL> hmm isn't there output on what will be created/removed first?

20:30 <cole-h> It might be because the machine/prometheus stuff died, so the host isn't being recreated in nixops' eyes?

20:30 <cole-h> Or rather, the ofborg stuff died

20:31 <LnL> yeah there's a difference between a machine that's missing and an existing machine that's not reachable

20:33 <cole-h> btw LnL, when you have a sec: thoughts on https://github.com/NixOS/ofborg/pull/527? Now that we can redeploy ofborg, I think that should finally be rebased to `released` :P

20:33 <{^_^}> #527 (by Mic92, 6 weeks ago, open): ofborg: post a finished check if evaluation starts

20:35 <cole-h> (I personally plan on waiting until Thursday or Saturday to merge+deploy, so I can babysit it when I have free time on those days)

20:37 <LnL> well, I'm kind of opposed to mixing multiple CI systems but I don't want to make the decision something on that

20:37 <cole-h> +1.

20:41 NinjaTrappeur has quit [Quit: WeeChat 2.8]

20:41 NinjaTrappeur has joined #nixos-borg

20:41 <LnL> I'd personally say if it's something that touches the status api it shouldn't be introduced unless it goes through an rfc

20:42 <LnL> even if it's just a simple linter

20:43 <cole-h> I figured we were past that point (since it's already been introduced to Nixpkgs), but I agree.

20:43 <LnL> the stale issue bot is a good example of something similar

20:56 <LnL> don't get me wrong, not having separate (managed) infrastructure would be great, but adding in more stuff is the other direction

20:58 <cole-h> You've convinced me, and I totally agree.

21:24 <cole-h> OK, yeah, maybe we should redeploy to 1 host and then back to 3. Seems like the machine is a SSH tar pit...

21:28 <LnL> I'm going to bed, but yeah seems like it should be unblocked now

21:29 <cole-h> o/ See you later.

21:29 <cole-h> Thanks for digging into the issue and realizing how simple a fix it was :D

21:33 <LnL> guess we should have looker properly sooner

21:34 <cole-h> s/we/cole-h/

21:35 <cole-h> I even spent a few days looking into it, and gave up :D

22:57 superherointj_ has joined #nixos-borg

22:59 superherointj_ has quit [Client Quit]

22:59 superherointj has quit [Quit: Leaving]