<clever>
it happened exactly once with the echo's around the mke2fs
<clever>
Infinisil: each time, i narrowed in closer on it, until i added the strace, then it never happened again
<clever>
Infinisil: 2 or 3 times
<clever>
aristid: my first thought is builtins.map and builtins.elem, test each to see if it exists
<clever>
gchristensen: just normal browsing activity
<clever>
gchristensen: whats weird, is that i wasnt even taxing the system that hard when i reproduced it the first few times
<clever>
gchristensen: test took forever to start with 105mb/sec going to the drives, but it can still pass
<clever>
plus zfs snapshots saving every byte they make, lol
<clever>
ah, 1gig per worker, with 20 workers
<clever>
qmm: looks like it should just work, what is the error?
<clever>
gchristensen: i wonder, when does stress delete its temp files, lol
<clever>
amd/root 20G 8.2G 12G 42% /
<clever>
error: writing to file: No space left on device
<clever>
but if i make -j3, on a 8 core machine, over half is going to waste
<clever>
so even with 16gig in my machine, i can only handle ~3 jobs in parallel
<clever>
and also, some ghc compiles need 5gig of ram
<clever>
yeah
<clever>
so it takes 12 hours to run
<clever>
and make tries to make it share with 3 gcc's
<clever>
gchristensen: for example, i want to make -j4 on my rpi, to use all 4 cores, but there is a single step in the libc locales, that needs all the ram
<clever>
gchristensen: this reminds me of a common complaint ive had with things like -j, they arent aware of the load types
<clever>
twice the bandwidth going thru the sata controllers
<clever>
i'm on an SSD mirror, so its shoving that much data into both drives
<clever>
cant even close a tab in the browser, heh
<clever>
98mb/sec
<clever>
gchristensen: system is noticably laggy with 63mb/sec going to the drives
<clever>
qmm: and what is your nix file?
<clever>
pie__: dont know enough about how python searches for deps
<clever>
gchristensen: load average: 66.30, 29.04, 12.06 and the test can still pass
<clever>
i havent been able to reproduce it even once since adding strace to the problem
<clever>
i'm guessing the system load was just right to be able to reproduce the problem
<clever>
it wasnt loosing cpu time to another process
<clever>
pbogdan_: but when i reproduced the test on my end, the mke2fs was hung, and the vm had gone idle
<clever>
maybe only retry if the build failed within a certain timeframe
<clever>
but it would also increase the cpu usage in the cluster
<clever>
grahamc: having some automatic retry in hydra would help with these bugs
<clever>
but this failure is so early in the boot, that i expect it to hit every single test in nixos
<clever>
dont know
<clever>
Infinisil: i think adding some retry to hydra would help
<clever>
and nix will just blindly build every test
<clever>
Infinisil: then try to build that txt file
<clever>
Infinisil: i think you can run listToAttrs over (import ./nixos/release.nix {}).tests and then use map to create a txt file refering to every test's $out
<clever>
heisenbugs always are
<clever>
rng
<clever>
Infinisil: yeah, thats why i was adding echo's to this region
<clever>
the native and cross compilers will have different hashes
<clever>
deltasquared: but your still basing everything around the hash of the compiler, not what it can produce
<clever>
yep
<clever>
deltasquared: you declare upfront what the hash of $out will be, and then $out's path wont depend on the inputs
<clever>
deltasquared: fixed-output derivations are the only way right now to prevent rebuilds
<clever>
so you have to apply that rewrite at unpack time
<clever>
Infinisil: part of the problem, is that rewriting the build to reference its own hash, changes its hash
<clever>
so if the native and cross-compiler produce the same output, things may share the result
<clever>
the hash of the temporary $out
<clever>
so you create a new $out, whose path is based on the temporary $out
<clever>
after building $out, you hash the entire thing, and then rewrite the references to itself (and its runtime deps) within every binary
<clever>
but i have seen another plan of a possible solution
<clever>
so every value that can potentialy impact the build, will also impact its output path
<clever>
and every attribute on that set, becomes an env variable when building the derivation
<clever>
and its also used to compute the value of $out (which is in that .drv)
<clever>
that hash is then used to create the /nix/store/<hash>-<name>.drv path
<clever>
deltasquared: all attributes will then be forced down to a string, and the entire set is hashed
<clever>
deltasquared: at the core of nix, every derivation must be made by calling builtins.derivation, and passing it a set containing, system, builder, args, and name
<clever>
and which gcc the glibc was built from
<clever>
the version of glibc and bash will also impact that build
<clever>
so you can have an arm, 32bitx86, and 64bit x86 build of a simple "echo foo > $out/bar.txt"
<clever>
even something as simple as write this string to a file, depends on which platform you run it on
<clever>
the purity in nix doesnt allow it
<clever>
nope
<clever>
and the storepath of the compiler, depends on the options it was build with, the platform it runs on, and the storepath of every one of its build inputs
<clever>
its based on the storepath of the compiler
<clever>
you want to rebuild things when the compiler changes
<clever>
and a good reason for that, is that some versions of the compiler may be glitched
<clever>
the cause, is that which compiler you use impacts the hash
<clever>
and if you only ever cross-compiled, that means building the entire gcc bootstrap
<clever>
not the hello that was cross-compiled
<clever>
so when you do nix-env -iA nixos.hello, it wants the hello that was natively compiled
<clever>
half of the problem with cross-compiling in nixpkgs, is that it impacts the hash
<clever>
but i never got around to doing a proper nixos install on it, so its still just rasbian with nix on the side
<clever>
but the v6's where just too slow, so i retired them in favor of a faster rpi
<clever>
i had nixos running on 2 armv6 rpi's, before the aarch64 stuff was being compiled by hydra
<clever>
took a while to figure out why
<clever>
it claimed the en_us mapping didnt exist, yet it clearly did
<clever>
because of that, the nixos build on my rpi couldnt even generate the keymap files for the initrd
<clever>
but it can potentialy break things
<clever>
there is a special glibc thing you can compile with, that will silently switch over to getdents64() on a 32bit os
<clever>
and 80% of software silently ignores it, treating the directory as empty
<clever>
getdents()'s will return EOVERFLOW, because the 64bit inode doesnt fit in the 32bit struct
<clever>
if your nfs server uses 64bit inodes, then a 32bit client will fail in the weirdest ways
<clever>
i also discovered a rather nasty nfs bug on 32bit clients
<clever>
and the current disk image routines use virtio-9p anyways, to copy things in
<clever>
slow, but better then making a 2gig disk image for every vm
<clever>
which allows sharing host directories directly to the guest
<clever>
9plan over the virtio interface
<clever>
leading to files just not existing
<clever>
i have previously had a problem interaction between the host zfs and qemu, where the guest /nix/store just randomly swapped entire directories
<clever>
the real question though, is the race in mke2fs?, the guest linux?, qemu?, the host linux?
<clever>
since adding the strace, the problem has not happened
<clever>
pbogdan_: the mke2fs is hanging on boot, triggering the 5 minute timeout
<clever>
pbogdan_: the problem i reproduced on my end, isnt just a simple timeout