#nixos-aarch64 on 2017-12-19

03:29 <gchristensen> https://stackoverflow.com/questions/20551600/squashfs-error-unable-to-read-page-size#21580311

08:42 orivej has quit [(Ping timeout: 240 seconds)]

11:15 <andi-> mhm, machine went down?

11:37 <gchristensen> ack, sorry andi-

11:44 <andi-> np, I am supposed to work anyway ^^

13:06 <gchristensen> clever: so I tried replacing the squash with an ext4 and it isn't booting properly

13:07 <gchristensen> clever: https://github.com/nix-community/aarch64-build-box/commit/beff3bb7776f9e3d84f95b0f25dac03b2d540b42 https://gist.github.com/grahamc/ad81db9441dcc1484de93fd99af88026

13:07 <gchristensen> andi-: should be up for you now

13:08 <andi-> k

13:16 <andi-> samueldr: did you try a vanilla 4.14.5 kernel on your RPi? I just discovered that I initially tested with 14.3 and then applied the patch on top of 14.5. Just trying to make sure not make make fuzz about an already fixed issue.. In the process of sending the patch to gregkh

13:16 <Dezgeg> gchristensen: I think you managed to not pass the initrd to the kernel

13:17 <gchristensen> that is not impossible

13:17 <gchristensen> :D

13:17 <Dezgeg> boosting up the loglevel could also help debugging

13:20 <gchristensen> kernel Image init=/nix/store/xb1ms3zkwpkprl3rbvfwm7w2nm0ppd7k-nixos-system-nixos-18.03pre-git/init loglevel=4 cma=0M biosdevname=0 net.ifnames=0 console=ttyAMA0 initrd=initrd

13:20 <gchristensen> do you have a preferred log level?

13:24 * gchristensen sets it to 8

13:24 <gchristensen> our docs appear to be backwards: The kernel console log level. Log messages with a priority numerically less than this will not appear on the console.

13:24 <gchristensen> also, making ext4 images is substantially slower than sqush

13:25 <Dezgeg> what's the end goal btw? why not just nixos-install?

13:26 <andi-> gchristensen: yes the docs are backwards.. stumbled upon that when I was toying with the kernel on the PI... wasn't sure if that is my brain or the document being wrong...

13:26 <gchristensen> :)

13:29 <gchristensen> Dezgeg: one big problem with my netbooting the automatic installer I made for the Packet aarch64 machines is if the squashfs is read with any correctable memory errors it fails

13:29 <gchristensen> https://stackoverflow.com/questions/20551600/squashfs-error-unable-to-read-page-size

13:30 <gchristensen> which happens to be the same error I'm seeing here

13:31 <Dezgeg> how do you know it's a 'correctable memory error' ?

13:32 <Dezgeg> that page is talking about raw NAND memory and stuff which certainly doesn't apply here

13:32 <gchristensen> I see. Maybe it doesn't apply, but it is very similar to what I'm seeing. I've now had Packet chase down these errors on multiple hosts and even when they look closely, they find nothing wrong with there hardware

13:33 <Dezgeg> does the md5sum of the squashfs image match?

13:33 <gchristensen> I believe it did, I can add a check for that

13:35 <gchristensen> are you thinking maybe a bug in PXE's fetching?

13:35 <Dezgeg> yeah, I trust firmware much less 8)

13:36 <gchristensen> is there a way to use nix's remote builders but not xfer the result back to my host?

13:37 <Dezgeg> ctrl-c at the right moment?

13:37 <gchristensen> hehe

13:37 <Dezgeg> #shipit

13:38 <gchristensen> Enterprise Level Quality

14:00 <gchristensen> setting the log level to 8 was maybe overkill :)

14:00 <gchristensen> [ 30.488621] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6

14:00 <gchristensen> [ 30.496108] Please append a correct "root=" boot option; here are the available partitions:

14:01 <gchristensen> then several ram[0-15] and sda, which isn't what I want

14:01 <Dezgeg> it just means your initrd didn't get loaded

14:01 <gchristensen> Dezgeg: I reckon I should go back to the squashfs with a high consoleLog and also check the md5sum

14:02 <Dezgeg> what are you specifically trying to do? place an ext4 image on the initrd?

14:03 <gchristensen> two-fold: (1) get rid of the unionfs so that things that don't compile on fancy overlay FSs will compile

14:03 <gchristensen> I'm planning on doing this by mounting a disk at /mnt-root/ and rsyncing the contents of the FS (squash, ext4, whatever) over to it

14:04 <gchristensen> (2) make the boot and install process reliable, which thus far with squashfs(as described) it isn't

14:04 <Dezgeg> for 1) the in-kernel overlayfs might be more reliable (and much more performant) than the FUSE thing

14:05 <gchristensen> one specific problem we're having with unionfs is it doesn't support hardlinks. IIRC from when I was a big docker fan, overlayfs isn't POSIX compliant and breaks some software strangely, like pip maybe

14:06 <gchristensen> that is why my goal is/was to get away from any sort of fancy FS underneath it all

14:08 <Dezgeg> well yes, but I don't think many of those strange cases come up with this sort of pure-ro + rw split

14:12 <gchristensen> are the problems when using whiteouts?

14:12 <andi-> openjdk (╯°□°）╯︵┻━┻ ... gcj ships a java 1.5 compatible version .. 1.8 requiers 1.6 or 1.7... guess I'll have to add another bootstrap stage to avoid binary blobs m(

14:17 <sphalerite> !m anti-

14:17 <sphalerite> andi-*

14:17 <sphalerite> and I guess that doesn't work here since we don't have botbot

14:17 <sphalerite> but yeah, it's good work you're doing!

14:18 <andi-> the insanity of that java foo.. there is a bootstrap jdk (${gcj.cc}) which I've to wrap yet another time since we do not provide javac (which usually is a symlink to gcj)...

14:19 <andi-> sphalerite: thanks, i somehow like ugly stuff -.-

14:22 <gchristensen> Dezgeg: how does this look:

14:22 <gchristensen> fileSystems."/nix/store" =

14:22 <gchristensen> { fsType = "overlayfs";

14:22 <gchristensen> device = "overlayfs";

14:22 <gchristensen> options = [

14:22 <gchristensen> "chroot=/mnt-root"

14:22 <gchristensen> "lowerdir=rw:/nix/.ro-store"

14:22 <gchristensen> "upperdir=/nix/.rw-store"

14:22 <gchristensen> ];

14:22 <gchristensen> };

14:23 <gchristensen> oops minus the `rw:` bit

14:25 <Dezgeg> I haven't actually used overlayfs, just heard things :P

14:27 <gchristensen> :D

14:44 <gchristensen> andi-: I'm going to take down the builder a moment

14:49 <andi-> gchristensen: go nuts.. I've to rethinkg that whole java mess..

14:49 <gchristensen> ou

14:49 <gchristensen> ok

15:14 orivej has joined joined #nixos-aarch64

15:30 <samueldr> andi-: when I tried last time, it was stock, but I couldn't manage any logging, at loglevel=7, making me thing there might be more than one issue...

15:31 <samueldr> ... it might be related to the exact u-boot used to boot?

15:31 <samueldr> (or that 7 isn't enough)

15:31 <samueldr> https://gist.github.com/samueldr/14a58850150b9db504e758cd9a2fb76c

15:34 <andi-> samueldr: boot.consoleLogLevel = 8; did it for me

15:35 <samueldr> it must be because of what was just previously said

15:36 <samueldr> does loglevel 7 show only priorities [0-6]?

15:37 <andi-> yes

15:37 <samueldr> (I'll have to re-test then)

15:37 <andi-> as far as I understood it putting it to 8 means everything (including!) 7

15:38 <andi-> +up to

15:38 <samueldr> I took for granted that it was a range inclusive and not exclusive, [0-7] and not [0-7[

15:38 <samueldr> meanwhile this week-end I tested (but haven't gotten far) to use the ext4 partition for everything but the raspberry pi bootloader stuff

15:38 <andi-> I remember reading some lkml post about that saying 8 is to be safe.. but maybe thats because many people come up with many different interpretations?

15:38 <samueldr> (moving /boot to ext4, and moving raspi stuff elsewhere)

15:39 <andi-> i'll send that mail to linux-stable/gregkh now.. i grepped through the 4.14.3 to 4.14.5 log and couldn't find anything that would potentially fix this

15:40 <Dezgeg> can't you just 'git cherry-pick' and see if that applies?

15:41 <andi-> thats what I did

15:41 <andi-> and I verified it working

15:41 <Dezgeg> all good then

15:41 <andi-> yes, just (today) realized that I tested 4.14.3 (without patch) and 4.14.5 (with patch) so I checked with samueldr first since he was having the same issue

18:19 orivej has quit [(Read error: Connection reset by peer)]

18:23 orivej has joined joined #nixos-aarch64

18:44 orivej_ has joined joined #nixos-aarch64

18:44 orivej has quit [(Ping timeout: 255 seconds)]

22:32 grw has quit [(Quit: WeeChat 1.7.1)]

22:32 <gchristensen> I wish I had my own private type 2a for this work :$

22:35 <andi-> hrhr

22:35 <andi-> you never have enough CPUs

22:36 <Dezgeg> I guess you can iterate in qemu?

22:40 <gchristensen> well so the problem is if if I use the hydra aarch64 box it gets GC'd away so fast, and if I use the netboot one it loses all its build progress every test :)

22:50 <gchristensen> Dezgeg: https://gist.github.com/grahamc/cf809d042c1f66b60aeed5474d28e71d your hint about nixos-rebuild's buildHost was very good

22:51 <Dezgeg> I don't recall hinting that

22:52 <gchristensen> ah, vcunat!

22:52 <Dezgeg> but yes it's nifty to copy-closure .drvs, I do it often

22:52 <gchristensen> nixos-rebuild can be build on remote system, and run the switch on a third!

22:53 <gchristensen> nixops minus nixops!

23:55 orivej_ has quit [(Ping timeout: 265 seconds)]

23:57 orivej has joined joined #nixos-aarch64