<samueldr>
(I don't mean to devalue myself, but I think you're probably more used to it!)
<gchristensen>
I'm not sure I am :')
<samueldr>
are the hydra hitting the disks?
<samueldr>
hydra builders*
<samueldr>
if it's all in memory it's weird?
<gchristensen>
it does hit the disk
<samueldr>
spinning rust I guess
<gchristensen>
I'd tell you but I'm waiting for nix-shell -p sysstat -p iotop:P
<gchristensen>
ok ... I pinged a friend of mine who is good at linux performance. let's see what they say. initial guess is: smells like contention on a file descriptor
<samueldr>
I know next to nothing to how hydra allocates tasks: is there a value you could, like, cut in half and see if it helps? (where it would help in a really bad case) (and I hate to suggest to do things like that in prod)
<samueldr>
but if e.g. it's tasked to do 64 builds at one, setting it to 32 might increase the throughput while looking at the real issue?
<samueldr>
(don't know how bad it is in reality)
<gchristensen>
`ls` takes 10s
<gchristensen>
(1) that is possible(2) the system is extremely underloaded, actually, not sure why itis wacking out
<samueldr>
from experience, once I/O gets in the mix it's hard to diagnose?
<samueldr>
(or lack thereof?)
<gchristensen>
slow to diagnose
<gchristensen>
my friend here was the database performance guy for a long time
<samueldr>
ah, the one who hates all the devs ;)
<gchristensen>
stripping FHS paths in `./tools/testing/selftests/powerpc/pmu/Makefile'...
<gchristensen>
^ I am that far :)
<samueldr>
during that time
<samueldr>
the community builder built the whole kernel, and an iso image
<gchristensen>
cool......
<samueldr>
in less than 20 minutes
<samueldr>
:/
<gchristensen>
............
<samueldr>
not bragging, showing how much disparity there is
<samueldr>
IIRC, the 96 core machine was a bit faster for the kernel
<gchristensen>
part of that is the community builder is waaaay better hardware
<samueldr>
there's so many trivial small builds
<gchristensen>
but also, something funky is going on here
<samueldr>
but yeah, as I said, the kernel was mighty fast on the 96 core machine
<samueldr>
I'm pretty sure no huge changes happened between those
<gchristensen>
I'm running fstrim to see if that magically helps?
<samueldr>
just like that, I have a dumb thing in my mind: there are two arm builders for hydra, right?
<gchristensen>
it has been running a Long Time without any output
<gchristensen>
yea
<samueldr>
are both of them used?
<gchristensen>
should be
<samueldr>
what would happen if you used two times the same builder to give builds on hydra?
<samueldr>
(long shot implausible scenario)
<gchristensen>
I don't understand the idea
<gchristensen>
02:09 <mason> Oh, I didn't miss much. I'd guess TRIM. Might not be, but without it the drive has to do a ton of reallocation when it decides it's full, which without TRIM could be unrelated to actual disk-full from the filesystem's perspective.
<samueldr>
two machines, A and B, instead of configuring hydra to use A and B, it is accidentally configured to use A and A
<gchristensen>
ah
<samueldr>
but the, there's no CPU/memory use :/
<samueldr>
but then*
<gchristensen>
yeah :/
<samueldr>
looking at the kernel builds (69) there's a creep, but not that slow, in the time it takes to build
<samueldr>
seems that the sample set I'm looking at all built on packet-t2a-1
<gchristensen>
t2a-2 is very responsive
<gchristensen>
let's take t2a-1 out of hydra for now
<samueldr>
hm, forgot -v, took 28s here on my ~ drive
<gchristensen>
my other Packet machine of the same type took <3min
<samueldr>
yeah, it smells like toast SSD?
<makefu>
seems there is services.fstrim.enable , i think i will just enable that on my laptop
<gchristensen>
ok, incoming packet-aarch64-3 -- just going to dump the old machine
orivej has joined #nixos-aarch64
orivej has quit [Ping timeout: 268 seconds]
<gchristensen>
rebooting the machine
<gchristensen>
(sorry for the late notice)
<gchristensen>
damn, I messed it up
<gchristensen>
ok rebooting the node again
<gchristensen>
building this image took a lot, I had to compile llvm and go and the kernel and and and ...
<gchristensen>
hopefully we see real movement on aarch64 having that bad one out.
<samueldr>
I want to look into a discrete channel, even if only in the nixpkgs-channels repository, for aarch64 instead of tracking nixos-18.09 which the aarch64 builds may or may not be caught up to :/
<gchristensen>
I'm afraid of channel proliferation. it makes for a bad user experience
<samueldr>
yeah, but what's the solution?
<samueldr>
(yeah, not-channels, something else)
<gchristensen>
having x, x-darwin, and x-aarch64 means nobody is on the same thing and you can't use nixops
<samueldr>
right now the experience is worse since there's no known good point :/
<gchristensen>
yeah
<gchristensen>
we could get more hardware and make it a blocker for the regular eval
<gchristensen>
/nix/store/5d0g32knkimxiwx6p3n9qs61l3lz8phz-post-device-commands: exec: line 8: /nix/store/qi248i3lh00wz6wm3196hd0yxprvjdh2-post-devices.sh: Permission denied
<gchristensen>
it would be super cool if I could not make this mistake :)
<samueldr>
currently doing a dependency, the kernel
<gchristensen>
marked as not big-parallel, wonder how it'll do
<samueldr>
I believe that as long as its dependents are built, everything should be fine; it's not CPU-bound like the squashfs step in the iso image
<gchristensen>
ah cool
<gchristensen>
I wonder if the squashfs job is big-parallel
<samueldr>
we'll figure it soon, whenever the PR is merged
<gchristensen>
it would be very nice to have 3-4 of those ARM servers I was mentioning
<samueldr>
I have no idea what I'm blabbering about: but would there be a way to set them up so if there's no big-parallel builds, they take on multiple smaller tasks?
<gchristensen>
no
<gchristensen>
well
<gchristensen>
ok
<gchristensen>
so I set it to take big-parallel, but also accept regular jobs
<gchristensen>
but there is no way to say, here, you can either do 2 big parallel, or 45 regular
<samueldr>
right
<samueldr>
could it be configured 1 big parallel and 30 regular?
<samueldr>
(if that even makes sense)
<gchristensen>
mmmmmaybe
<gchristensen>
so, yes, I could probably add the instance to the list twice
<gchristensen>
under a second hostname
<gchristensen>
but the 30 regular jobs would be able to access the same # of cores as the big-parallel jobs
<samueldr>
right, so "not without containerizing or vm or other dastardly tricks"
<gchristensen>
it is possible that, on average, that would work pretty well -- just assuming non-big-parallel will behavewell