orivej has quit [(Ping timeout: 240 seconds)]
<andi-> mhm, machine went down?
<gchristensen> ack, sorry andi-
<andi-> np, I am supposed to work anyway ^^
<gchristensen> clever: so I tried replacing the squash with an ext4 and it isn't booting properly
<gchristensen> andi-: should be up for you now
<andi-> k
<andi-> samueldr: did you try a vanilla 4.14.5 kernel on your RPi? I just discovered that I initially tested with 14.3 and then applied the patch on top of 14.5. Just trying to make sure not make make fuzz about an already fixed issue.. In the process of sending the patch to gregkh
<Dezgeg> gchristensen: I think you managed to not pass the initrd to the kernel
<gchristensen> that is not impossible
<gchristensen> :D
<Dezgeg> boosting up the loglevel could also help debugging
<gchristensen> kernel Image init=/nix/store/xb1ms3zkwpkprl3rbvfwm7w2nm0ppd7k-nixos-system-nixos-18.03pre-git/init loglevel=4 cma=0M biosdevname=0 net.ifnames=0 console=ttyAMA0 initrd=initrd
<gchristensen> do you have a preferred log level?
* gchristensen sets it to 8
<gchristensen> our docs appear to be backwards: The kernel console log level. Log messages with a priority numerically less than this will not appear on the console.
<gchristensen> also, making ext4 images is substantially slower than sqush
<Dezgeg> what's the end goal btw? why not just nixos-install?
<andi-> gchristensen: yes the docs are backwards.. stumbled upon that when I was toying with the kernel on the PI... wasn't sure if that is my brain or the document being wrong...
<gchristensen> :)
<gchristensen> Dezgeg: one big problem with my netbooting the automatic installer I made for the Packet aarch64 machines is if the squashfs is read with any correctable memory errors it fails
<gchristensen> which happens to be the same error I'm seeing here
<Dezgeg> how do you know it's a 'correctable memory error' ?
<Dezgeg> that page is talking about raw NAND memory and stuff which certainly doesn't apply here
<gchristensen> I see. Maybe it doesn't apply, but it is very similar to what I'm seeing. I've now had Packet chase down these errors on multiple hosts and even when they look closely, they find nothing wrong with there hardware
<Dezgeg> does the md5sum of the squashfs image match?
<gchristensen> I believe it did, I can add a check for that
<gchristensen> are you thinking maybe a bug in PXE's fetching?
<Dezgeg> yeah, I trust firmware much less 8)
<gchristensen> is there a way to use nix's remote builders but not xfer the result back to my host?
<Dezgeg> ctrl-c at the right moment?
<gchristensen> hehe
<Dezgeg> #shipit
<gchristensen> Enterprise Level Quality
<gchristensen> setting the log level to 8 was maybe overkill :)
<gchristensen> [ 30.488621] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
<gchristensen> [ 30.496108] Please append a correct "root=" boot option; here are the available partitions:
<gchristensen> then several ram[0-15] and sda, which isn't what I want
<Dezgeg> it just means your initrd didn't get loaded
<gchristensen> Dezgeg: I reckon I should go back to the squashfs with a high consoleLog and also check the md5sum
<Dezgeg> what are you specifically trying to do? place an ext4 image on the initrd?
<gchristensen> two-fold: (1) get rid of the unionfs so that things that don't compile on fancy overlay FSs will compile
<gchristensen> I'm planning on doing this by mounting a disk at /mnt-root/ and rsyncing the contents of the FS (squash, ext4, whatever) over to it
<gchristensen> (2) make the boot and install process reliable, which thus far with squashfs(as described) it isn't
<Dezgeg> for 1) the in-kernel overlayfs might be more reliable (and much more performant) than the FUSE thing
<gchristensen> one specific problem we're having with unionfs is it doesn't support hardlinks. IIRC from when I was a big docker fan, overlayfs isn't POSIX compliant and breaks some software strangely, like pip maybe
<gchristensen> that is why my goal is/was to get away from any sort of fancy FS underneath it all
<Dezgeg> well yes, but I don't think many of those strange cases come up with this sort of pure-ro + rw split
<gchristensen> are the problems when using whiteouts?
<andi-> openjdk (╯°□°)╯︵┻━┻ ... gcj ships a java 1.5 compatible version .. 1.8 requiers 1.6 or 1.7... guess I'll have to add another bootstrap stage to avoid binary blobs m(
<sphalerite> !m anti-
<sphalerite> andi-*
<sphalerite> and I guess that doesn't work here since we don't have botbot
<sphalerite> but yeah, it's good work you're doing!
<andi-> the insanity of that java foo.. there is a bootstrap jdk (${gcj.cc}) which I've to wrap yet another time since we do not provide javac (which usually is a symlink to gcj)...
<andi-> sphalerite: thanks, i somehow like ugly stuff -.-
<gchristensen> Dezgeg: how does this look:
<gchristensen> fileSystems."/nix/store" =
<gchristensen> { fsType = "overlayfs";
<gchristensen> device = "overlayfs";
<gchristensen> options = [
<gchristensen> "chroot=/mnt-root"
<gchristensen> "lowerdir=rw:/nix/.ro-store"
<gchristensen> "upperdir=/nix/.rw-store"
<gchristensen> ];
<gchristensen> };
<gchristensen> oops minus the `rw:` bit
<Dezgeg> I haven't actually used overlayfs, just heard things :P
<gchristensen> :D
<gchristensen> andi-: I'm going to take down the builder a moment
<andi-> gchristensen: go nuts.. I've to rethinkg that whole java mess..
<gchristensen> ou
<gchristensen> ok
orivej has joined joined #nixos-aarch64
<samueldr> andi-: when I tried last time, it was stock, but I couldn't manage any logging, at loglevel=7, making me thing there might be more than one issue...
<samueldr> ... it might be related to the exact u-boot used to boot?
<samueldr> (or that 7 isn't enough)
<andi-> samueldr: boot.consoleLogLevel = 8; did it for me
<samueldr> it must be because of what was just previously said
<samueldr> does loglevel 7 show only priorities [0-6]?
<andi-> yes
<samueldr> (I'll have to re-test then)
<andi-> as far as I understood it putting it to 8 means everything (including!) 7
<andi-> +up to
<samueldr> I took for granted that it was a range inclusive and not exclusive, [0-7] and not [0-7[
<samueldr> meanwhile this week-end I tested (but haven't gotten far) to use the ext4 partition for everything but the raspberry pi bootloader stuff
<andi-> I remember reading some lkml post about that saying 8 is to be safe.. but maybe thats because many people come up with many different interpretations?
<samueldr> (moving /boot to ext4, and moving raspi stuff elsewhere)
<andi-> i'll send that mail to linux-stable/gregkh now.. i grepped through the 4.14.3 to 4.14.5 log and couldn't find anything that would potentially fix this
<Dezgeg> can't you just 'git cherry-pick' and see if that applies?
<andi-> thats what I did
<andi-> and I verified it working
<Dezgeg> all good then
<andi-> yes, just (today) realized that I tested 4.14.3 (without patch) and 4.14.5 (with patch) so I checked with samueldr first since he was having the same issue
orivej has quit [(Read error: Connection reset by peer)]
orivej has joined joined #nixos-aarch64
orivej_ has joined joined #nixos-aarch64
orivej has quit [(Ping timeout: 255 seconds)]
grw has quit [(Quit: WeeChat 1.7.1)]
<gchristensen> I wish I had my own private type 2a for this work :$
<andi-> hrhr
<andi-> you never have enough CPUs
<Dezgeg> I guess you can iterate in qemu?
<gchristensen> well so the problem is if if I use the hydra aarch64 box it gets GC'd away so fast, and if I use the netboot one it loses all its build progress every test :)
<gchristensen> Dezgeg: https://gist.github.com/grahamc/cf809d042c1f66b60aeed5474d28e71d your hint about nixos-rebuild's buildHost was very good
<Dezgeg> I don't recall hinting that
<gchristensen> ah, vcunat!
<Dezgeg> but yes it's nifty to copy-closure .drvs, I do it often
<gchristensen> nixos-rebuild can be build on remote system, and run the switch on a third!
<gchristensen> nixops minus nixops!
orivej_ has quit [(Ping timeout: 265 seconds)]
orivej has joined joined #nixos-aarch64