<patagonicus>
So, I set up my NAS system config in a way that I can also generate a script that will build an SD card image for me, to get the system up and running. What I didn't think about is that you can't build the image on the system it is for, since it's using the same name for the LVM volume group. That's a bit of a bummer.
zupo has joined #nixos-aarch64
zupo has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
cole-h has quit [Quit: Goodbye]
zupo has joined #nixos-aarch64
betrion[m] has quit [Quit: Idle for 30+ days]
ib07_ has joined #nixos-aarch64
ib07 has quit [Ping timeout: 272 seconds]
ib07_ has quit [Max SendQ exceeded]
ib07 has joined #nixos-aarch64
ib07 has quit [Max SendQ exceeded]
ib07 has joined #nixos-aarch64
FRidh has quit [Ping timeout: 240 seconds]
FRidh has joined #nixos-aarch64
FRidh has quit [Ping timeout: 272 seconds]
FRidh has joined #nixos-aarch64
<DigitalKiwi>
oooh do we have a working ghc now?!?
<angerman>
DigitalKiwi: for what? Arm?
<DigitalKiwi>
aarch64
<DigitalKiwi>
it's been not on hydra for quite a while
<angerman>
If you are ok with single threaded only, the llvm builds are ok’ish.
<angerman>
From 9.2 on we should have a native code gen in ghc.
<DigitalKiwi>
but i mean in nixpkgs it didn't exist it was too big to build on hydra and good luck building it on rpi
<angerman>
I think domen or someone manages to get it under size.
<roberth>
hmm, I thought the bot had something to say about "hi"
<gchristensen>
I have a lazy idea for solving the c1.large.arm sda problem: changing the zpool initialization code to create a ramdisk of 50% the ram for the zpool, if no disks are detected
<roberth>
gchristensen: don't aarch64 machines typically come with comparatively little RAM?
<gchristensen>
typically
<gchristensen>
I think this one has 128g
<roberth>
I guess it's worth a shot
<roberth>
maybe limit the number of concurrent jobs a bit
mahogany has joined #nixos-aarch64
<roberth>
gchristensen++
<{^_^}>
gchristensen's karma got increased to 379
LnL has quit [Ping timeout: 272 seconds]
LnL has joined #nixos-aarch64
LnL has joined #nixos-aarch64
wavirc22_ has quit [Ping timeout: 260 seconds]
LnL has quit [Ping timeout: 265 seconds]
veleiro has joined #nixos-aarch64
veleiro` has joined #nixos-aarch64
LnL has joined #nixos-aarch64
LnL has joined #nixos-aarch64
veleiro has quit [Ping timeout: 246 seconds]
veleiro`` has joined #nixos-aarch64
zupo has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
veleiro` has quit [Ping timeout: 272 seconds]
zupo has joined #nixos-aarch64
<angerman>
roberth: couldn’t we just disable building profiling in the first place?
<sphalerite>
[Note: You may need to press RETURN or Ctrl+L to get a prompt.]
<gchristensen>
press enter?
<sphalerite>
nothing
<gchristensen>
okay cool
<gchristensen>
so that will boot nixos
<gchristensen>
sphalerite: ssh 89aa8aae-9f99-440f-8c03-d16b103ef4fc@sos.ewr1.platformequinix.com this one will be alpine -- try connecting?
<sphalerite>
I'm in
<gchristensen>
cool
<gchristensen>
rebooting both
<gchristensen>
alpine should be rebooting
<gchristensen>
nixos should be rebooting
<sphalerite>
a bit slower than you said, but yes :D
bpye has joined #nixos-aarch64
<sphalerite>
good to know that ARM servers haven't improved the situation in terms of server hardware booting really slowly x)
<gchristensen>
haha yeah
<gchristensen>
bare metal, man
zupo has joined #nixos-aarch64
<gchristensen>
btw I can't see what you're doing -- those connections are exclusive
<sphalerite>
ok
red[evilred] has joined #nixos-aarch64
<red[evilred]>
Sseing this makes me happy
<sphalerite>
ok, so from the alpine one it looks like all the drivers involved are pretty generic, which is a shame because it makes my first suspicion less likely
<red[evilred]>
good luck sphalerite (IRC) !
<sphalerite>
(also wow the nixos initrd takes a long time to download)
<red[evilred]>
I'd say poke me if you need another set of eyes, but there's enough eyes in this channel I can't imagine that I'll be holding some unique arcane knowledge on this particular subject
<sphalerite>
the "an error occurred in stage 1" error message is expected and caused by debug1devices
<sphalerite>
it not finishing the message is not expected
<sphalerite>
"It's not a bug, it's just less expected behaviour"
<sphalerite>
gchristensen: could you reset the nixos machine again?
<gchristensen>
yep, issued
bpye has quit [Ping timeout: 240 seconds]
<sphalerite>
gchristensen: the alpine machine didn't dump all these "ERROR: N1 OCX_LNEX_INT(20)[stat_msg]"-like errors to the console… and this is long before iPXE even
<gchristensen>
possibly different firmware versions
<gchristensen>
they're both busted
<sphalerite>
ok
<gchristensen>
I can swap which is which if you'd like :)
<sphalerite>
busted in the sense of won't boot nixos, right?
<gchristensen>
ea
<gchristensen>
yea
<sphalerite>
well, right now the nixos machine doesn't even seem to be getting as far as ipxe…
<gchristensen>
heh.
<gchristensen>
want to try another one? :)
rajivr has quit [Quit: Connection closed for inactivity]
<sphalerite>
gchristensen: same thing… wtf!? Could you change the cmdline to debug1 instead of debug1devices and reset again? :/
<gchristensen>
sure
<gchristensen>
boot.debug1 ?
<sphalerite>
yep
<red[evilred]>
General question - (not to slow you down - but in the moments of waiting for things to boot etc...) - why is the bios / boot thing seemingly more difficult with aarch64 than other archs? Lack of standardizations? or ...
<gchristensen>
rebooted
<sphalerite>
red[evilred]: lack of _following_ standards :p
<sphalerite>
gchristensen: "modprobe: FATAL: Module pci_thunder_ecam not found in directory /nix/store/ll41hdmy9h592szkz4vbsa6lppcp3qpg-linux-5.4.78-modules/lib/modules/5.4.78"
<sphalerite>
could this be related at all? :)
<sphalerite>
I mean, I highly doubt that it's related to me not being able to get a shell so far
<gchristensen>
it might be?
<samueldr>
if the storage controller is PCI(e) maybe
<samueldr>
add that to the modules list I guess?
<sphalerite>
samueldr: this is in the initramfs build
<sphalerite>
I'm checking the kernel build against the kernel sources now
<gchristensen>
how aboutI make it boot off your system directly? caveat: it needs to be http (you don't want to suffer through the https tricks needed to make it work.)
<sphalerite>
that sounds good, let me set up a non-HTTPS vhost
<andi->
Do we know what the last known good version was and when that was?
<sphalerite>
andi-: yes, just found out that the last known good version is 19.03
<sphalerite>
trying 19.09 next
<sphalerite>
then I'll start bisecting.
<hexa->
whew
<hexa->
oops
<hexa->
that's … old.
<sphalerite>
I'm definitely hoping 19.09 works :p
<gchristensen>
whew and oops and old are all correct
<hexa->
so say we all
<sphalerite>
gchristensen: do you happen to have a script that can bisect based on that history? :D
<gchristensen>
just head, tail, and bc :/
<sphalerite>
hahaha ok
<gchristensen>
I wish
<sphalerite>
that sounds like something not-excessively-difficult though tbh
<sphalerite>
19.09 does work!
<hexa->
lucky
<andi->
now try 20.03, maybe it is state in the machine ;)
<andi->
("Always boot 19.09 first and then do a kexec into 20.09!")
<gchristensen>
don't curse this andi-
tilpner_ has joined #nixos-aarch64
tilpner has quit [Ping timeout: 260 seconds]
tilpner_ is now known as tilpner
<andi->
sphalerite: did you check the kernel versions that alpine is using and what we are using?
<simpson>
Is kexec a thing on ARM? 2020's wild.
<sphalerite>
andi-: yeah they're using a 5.4 too
<sphalerite>
though actually it's older than the september 20.09 one
<sphalerite>
simpson: it is. That doesn't mean it works.
<sphalerite>
lol the kernel from 19.09 can't reboot
<simpson>
sphalerite: Indeed. Today let's stick to jokes where kexec is the setup, and not the punchline~
<gchristensen>
lol
<andi->
rebooting.. who needs that? I need 5 9s!
<gchristensen>
reboot? ok
<sphalerite>
hehehe
<gchristensen>
I'm so responsive and helpful, even while chopping apples ...
<samueldr>
kexec is a thing on ARM
<samueldr>
what about kernel 4.9 on 20.09?
<samueldr>
because IIRC 20.03 (or was it 19.09?) upgraded to the then current LTS
<samueldr>
that was 5.4
<sphalerite>
4.19 I guess you mean?
<sphalerite>
good point
<sphalerite>
19.09 was using 4.19
<gchristensen>
reboot? ok
<samueldr>
yeah, that 1 always eludes me
<gchristensen>
reboot? ok
<samueldr>
ok!
<sphalerite>
gregg rulz ok
<samueldr>
gregg?
<sphalerite>
nvm just an obscure game reference
<sphalerite>
also samueldr your suspicion was excellent, 5.4 from 19.09 does not work.
<samueldr>
so 4.19..5.4 changes for thunderx
<samueldr>
but we can make the infra go brrr by specifying 4.19 for the time being
<samueldr>
(I guess?)
<sphalerite>
probably. Giving it a show now
<sphalerite>
s/show/shot/
<gchristensen>
bisecting 4.19 to 5.4 can't be that bad. how much could it be, 10 patches?
<gchristensen>
reboot? ok
<samueldr>
well, first you manually figure out which major revision introduced the defect at its latest patch level
<samueldr>
then see if its .0 release has the defect
<samueldr>
(in case of a backport from then-master)
<samueldr>
if it does, you can bisect through versions and probably find something quite quickly
<samueldr>
otherwise you know you're in that x.y.0...x.y+1.0 range
<samueldr>
when you know it affects one specific thing, you can sometimes be lucky enough with the commit messages and find a suspicious change
<red[evilred]>
Thhis all sounds very promising - sphalerite (IRC) ++
<samueldr>
I'd say that on more than half of my kernel bisects I was able to guess at the commit introducing the regression, and validated it through a full bisect