<gchristensen>
the machines that don't get rebuilt daily almost definitely run Nix unstable
<lukegb>
is hydra-queue-runner built against a "good" version of nix
<lukegb>
do we have a procedure for obliterating things from the cache?
<gchristensen>
we don't, no
<gchristensen>
and almost certainly yes
<gchristensen>
but I'll check
<lukegb>
hrm, interesting
<gchristensen>
wait, actually, maybe not
<gchristensen>
I forgot that flakes bring new interesting changes around unified nix versions
<lukegb>
basically if we delete rm19p00n8hvd63k9dd3yfbigc7rw8kqs.ls from the cache, that should unstick us, at the cost of losing some metadata about the store path
<lukegb>
OTOH that file is broken anyway
<lukegb>
rm19p[...] corresponds to /nix/store/rm19p00n8hvd63k9dd3yfbigc7rw8kqs-tela-icon-theme-2021-01-21 (the hydra build I linked earlier)
<gchristensen>
which machine did it run on?
<lukegb>
packet builder; 2718a894.packethost.net
<lukegb>
but I'm not sure at what point the .ls is generated - it might be in hydra-queue-runner
<lukegb>
inside BinaryCacheStore::addToStoreCommon; I don't think it gets copied around, so I think it'll always get regenerated when adding to a new store?
<lukegb>
gchristensen: thoughts about deleting the broken .ls file?
<lukegb>
otherwise I'll just bump it out of the cache by making a pointless change to the derivation to change the hash
<lukegb>
I think that's new enough to be > 8d651a1f
<lukegb>
hrm
<lukegb>
and that was 10 days ago too
<lukegb>
(I assume it was deployed at that point as well?)
<gchristensen>
almost definitely
<lukegb>
hmm, so I can see: commit bumping nix on the 22nd, restarted probably on the 25th (looking at hydra_uptime_total prometheus graphs), and then https://hydra.nixos.org/build/142182308 on the 27th
<lukegb>
although... does that make any sense? hydra.nixos.org's footer says 0.1.20210429.18d2716 but there wasn't a restart according to hydra_uptime_total after the 29th
<gchristensen>
ummmm
<gchristensen>
you know what
<gchristensen>
sigh
<gchristensen>
Active: active (running) since Sun 2021-04-25 17:37:25 CEST; 6 days ago
<lukegb>
which should be fine, right? which nix is in its closure?
<lukegb>
so it's just a json tree of the files in the NAR
<lukegb>
it's only about 77M of actual valid JSON, the rest is broken
<gchristensen>
incredible
<lukegb>
we have a lot of these broken files in the cache
<lukegb>
(my argument is we should delete or fix them, at least the ones referenced in the current latest evals for each jobset)
<gchristensen>
we could probably delete them, but it is definitely not common practice
<lukegb>
yeah.
<gchristensen>
so I'm a bit anxious about doing so
<gchristensen>
I'd rather not accidentally something
<lukegb>
understandable
<lukegb>
I merged https://github.com/NixOS/nixpkgs/pull/121519 which will change tela-icon-theme's output hash, I think, so once today's eval runs we can see if we're still uploading broken files
<lukegb>
shower thoughts: it would be interesting to have hydra automatically decide when to run evals based on how busy the workers are (and some configuration metric) rather than on a fixed time interval
<gchristensen>
like, find work to do?
<gchristensen>
there is a "one-at-a-time" jobset type which only queues evaluation if the prev finished
<lukegb>
hmm, I guess. it'd be interesting to try to consider global state though
<lukegb>
it gets a bit tricky when you mash up the fact that we have separate pools of resources though
<gchristensen>
not restarting the queue runner strikes again
<gchristensen>
May 02 21:50:09 ceres systemd[1]: hydra-queue-runner.service: Consumed 1month 2w 13h 41min 18.485s CPU time, received 3.9T IP traffic, sent 1.3T IP traffic.