#nixos-infra on 2021-04-25

2020-03-23 19:08 samueldr changed the topic of #nixos-infra to: NixOS infrastructure | logs: https://logs.nix.samueldr.com/nixos-infra/

01:58 MichaelRaskin has quit [Ping timeout: 240 seconds]

06:03 Guest97996 is now known as JJJollyjim

06:03 JJJollyjim is now known as Guest13908

06:03 Guest13908 has quit [Quit: authenticating]

06:04 Guest139081 has joined #nixos-infra

07:09 cole-h has quit [Ping timeout: 268 seconds]

09:16 MichaelRaskin has joined #nixos-infra

14:29 <lukegb> https://hydra.nixos.org/build/141975028 hrm, "Hydra failure" isn't very specific

14:33 <hexa-> "below" also is difficult to follow, given that the page is too short to jump anywhere

14:33 <hexa-> fwiw, staging-next was merged

14:36 <hexa-> (there is no nix-error anchor)

14:40 <lukegb> gchristensen: hmm, how large is hydra's DB

14:40 <lukegb> I was contemplating doing some analysis on builds to work out which ones are good candidates for being marked big-parallel

14:41 <gchristensen> neat

14:41 <gchristensen> you need a few hundred gb to hold the db

14:41 <gchristensen> the .sql is 30G

14:42 <lukegb> hm, that's alright

14:42 <MichaelRaskin> Hmm. Year-cut tab-separated values could be nice for analytics experiments…

14:42 <lukegb> I was tempted to just dump the thing into BigQuery for the moment but that's because, well, I'm familiar with it

14:42 <lukegb> but TSVs are more generally useful

14:43 <MichaelRaskin> Yeah, one could have a few years at a time in RAM and do whatever analytics desired

14:44 <gchristensen> lukegb: if you come up with an export query and PR it to github.com/grahamc/hydra-pg-load-dump I'll run it for you

14:44 <lukegb> Ooh, that's neat

14:49 <gchristensen> the only rule is don't get the users table :)

14:55 <MichaelRaskin> Hmm, are psql options allowed?

14:55 <gchristensen> it is an arbitrary script

14:55 <hexa-> https://hydra.nixos.org/build/141975028

14:56 <gchristensen> MichaelRaskin: what are you wondering?

14:56 <gchristensen> / thinking

14:57 <MichaelRaskin> Nah, I don't want to play gotcha in this setting, just thinking about copy.sql as a TSV export

15:19 <lukegb> gchristensen: https://github.com/grahamc/hydra-pg-load-dump/pull/5 maybe

15:27 <lukegb> ah, shellcheck says no

15:27 <gchristensen> ruh roh, heh

15:28 <gchristensen> d'you mind PRing a fixup on that? a bit unfair to ask you to fix something you can't easily check in automatic CI, but ... :)

15:28 <lukegb> yeah, will do

15:38 cole-h has joined #nixos-infra

15:38 <lukegb> https://github.com/grahamc/hydra-pg-load-dump/pull/6

15:58 <gchristensen> lukegb: looks pretty good, some useless trivia is that when assigning variables you never need quotes. of course, this is not an objection to adding quotes :D

15:58 <lukegb> TIL :p

15:59 <gchristensen> erm, okay, an issue I'll need to address elsewhere :D

16:00 <lukegb> ah, hah

16:38 <gchristensen> lukegb: https://buildkite.com/grahamc/postgres-load-dump/builds/51#679e1a7a-d395-4c4e-8f50-538b52ad4585

16:39 <lukegb> gchristensen: ooh thanks

16:39 * lukegb keeps an eye on it

16:39 <gchristensen> note the data is a bit stale. this does the dump and load on a secondary zpool, and the automation that copies it from my archival zpool to the working zpool was stalled since 2021-04-17T13:10:00Z

16:42 <gchristensen> but we'll get you this data, then after the working pool catches up we can get you a fresher copy

16:42 <lukegb> Sounds good! Thanks :)

16:43 * gchristensen grumbles grumpily about zrepl not copying the dataset over, ignoring the part where I didn't ask it to

16:49 <hexa-> so, about that tested job failure, how would we get more details on that?

17:08 <lukegb> hexa-: for the moment I think just waiting for the eval's jobs to finish and then trying to see if we can replicate it locally

17:16 <gchristensen> I wonder if postgres's dump program could output more info ... https://buildkite.com/grahamc/postgres-load-dump/builds/51#679e1a7a-d395-4c4e-8f50-538b52ad4585/24-422

17:20 <MichaelRaskin> I guess there is an option to just run a loop printing the output file sizes in the background…

17:22 <gchristensen> maybe the output could be told to go to stdout, then pv

17:23 <gchristensen> actually, is that going to create a tsv? :)

17:24 <MichaelRaskin> I thought pg_dump is _slightly_ not TSV?

17:24 <gchristensen> pg_dump makes all sorts of outputs

17:26 <MichaelRaskin> Well, sure, if you ask it insistently enough

17:27 <gchristensen> no, really, like

17:27 <gchristensen> https://linux.die.net/man/1/pg_dump we're passing -Fc

17:27 <gchristensen> c custom Output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default.

17:40 <gchristensen> maybe this script should do the full dump as a separate buildkite task

17:59 <lukegb> it depends how often you plan on running it and how much spare I/O you have, tbh

18:00 <gchristensen> I'd like to run it weekly to say hey your backups aren't valid enough to restore

18:01 <lukegb> I'm thinking about metrics I'd like and trying to work out if they exist already

18:01 <gchristensen> cool

18:01 <lukegb> among other things: Hydra builder occupancy (i.e. how many jobs could a node have vs. how many are scheduled on it at the moment)

18:01 <gchristensen> that one is knowable now

18:02 <gchristensen> importantly, though, the hydra scheduler does not prioritize filling machines

18:03 <lukegb> oh?

18:03 <gchristensen> afaik, it has an ordered list of jobs it wants to run, and distributes them as possible

18:04 <lukegb> hmm, I see

18:06 <gchristensen> this does confuse people :) and one of the things I'd like to revisit

18:51 thefloweringash has quit [Ping timeout: 245 seconds]

18:52 nh2[m] has quit [Ping timeout: 245 seconds]

18:52 roberth has quit [Ping timeout: 245 seconds]

18:52 garbas[m] has quit [Ping timeout: 245 seconds]

18:54 thefloweringash has joined #nixos-infra

18:58 nh2[m] has joined #nixos-infra

18:58 roberth has joined #nixos-infra

18:58 garbas[m] has joined #nixos-infra

21:28 <gchristensen> lukegb: ping see dm

21:29 <lukegb> ty

23:32 supersandro2000 has quit [Killed (verne.freenode.net (Nickname regained by services))]

23:32 supersandro2000 has joined #nixos-infra