pxc has quit [Quit: WeeChat 2.3]
<samueldr> :o
<samueldr> the broken fat32 seems to happen on 18.09 and 18.03 also
<samueldr> 18.09 not too surprising
<samueldr> #51150 for notes
<{^_^}> https://github.com/NixOS/nixpkgs/issues/51150 (by samueldr, 2 hours ago, open): [aarch64] The sd-image-aarch64 FAT32 partition is broken
<samueldr> annoying how it looks like mtools doesn't have code versioning available
<samueldr> funny, seeing traces of nix people on their mailing list
<samueldr> kinda worrying though that a patch for the (first) issue here was overlooked in 2014, and it took another user fixing the same thing this year to get it done :/
<samueldr> ooh
<samueldr> uh oh!
<samueldr> overlay 63G 63G 12K 100% /nix/store
<samueldr> gchristensen: ^ community box
<samueldr> I know rebooting would fix the issue, but I'm thinking it's not the best way to fix it
<gchristensen> go for it
<gchristensen> we should add some auto-gc instructions
<gchristensen> but for this particular time, just reboot
<samueldr> done
<samueldr> though, it happened quickly
<samueldr> it wasn't up for long?
<gchristensen> hrm yeah it wasn't
<samueldr> (too late to investigate)
<clever> ive found that min-free based GC breaks evals a lot
<gchristensen> evals?? :o
<clever> the evaluator doesnt GC root things as it evals
<clever> and if a gc triggers in the middle of an eval
<samueldr> oh
<samueldr> I *think* I have found a fix for all the fat32 issues
<clever> if you have a A depends on B, and between nix calling builtins.derivation on A&B, you do a GC, then B isnt rooted yet
<samueldr> those server-type machines sure take a long time booting
<gchristensen> don't they
<samueldr> my workstation is fun like that, it spends about thrice as much time in POST than linux boots me to desktop
<gchristensen> clever: maybe open a bug about that? also, seems most auto-gc has been used on build
<gchristensen> -only nodes
<{^_^}> nix#2285 (by cleverca22, 20 weeks ago, open): min-free based auto garbage collection breaks its own build
<gchristensen> nice
<gchristensen> though I thought nix checked ram ...
<clever> nix doesnt check ram
<clever> nix checks mmaps (is the file mapped?) and open file handles, and env vars
<clever> and the derivations in an eval are none of those
<clever> once the eval is done, it creates a root, for the targeted drv (-A hello)
<clever> but until the eval is done, its not safe
<gchristensen> ahh
<clever> 90% of the time, an eval finishes too fast to be an issue
<clever> but IFD causes it to stall
<gchristensen> naturally
<gchristensen> and build :)
<gchristensen> increasing chance of gc
<clever> and also just downloads, due to IFD
<gchristensen> maybe if it stops to GC, take a moment to lock all the drvs it has?
<clever> before the IFD build
<clever> it doesnt really know what drv's it has, it would have to walk the nixexpr ast
<clever> which includes all of nixpkgs
<clever> for example, buildInputs = [ ifd ghc ];
<clever> ghc is a thunk, but also a dep
<gchristensen> hrm
<gchristensen> seems hard :)
<clever> at least in this case, ghc is a thunk, so it doesnt expet the .drv to exist yet
<clever> and after the gc runs, it forces ghc, and (re)creates the .drv
<clever> gchristensen: only solution i can think, is to root EVERY drv you create as you eval (expensive, and slower eval), and then de-root them when you have the real target
<gchristensen> I'm surprised Nix can't yet know which drvs it needs? why would it expect them to exist if it didn't need it
<clever> when you force a thunk over pkgs.ghc, it will create the .drv
<clever> and then when you force a thunk depending on ghc, it will read the ghc.drv string from the heap
<clever> and then fail when ghc is missing on-disk
<samueldr> (thunk is the sound my mind does when thinking about the things nix does internally)
<gchristensen> haha
<gchristensen> sorry I can't reason about this. I've been sick for a few days now. not 100% ... here...
<clever> gchristensen: i'm missing 4 wisdom teeth
<gchristensen> did you just get them out?
<gchristensen> are you telling me you're still sharp after anesthesia? :P
<clever> had them removed on the 20th
<gchristensen> ah, I hope it went well -- that is no fun of a procedure
<clever> the worst part was getting the IV in
<clever> dont remember much after that :P
<gchristensen> :) good
<clever> and the first night left me drooling blood all over the pillow
<samueldr> WELP, clever you speak too much, you're breaking the GUI from the logs thing :)
<clever> :D
<samueldr> I was curious to see if you were less active on the 20th
<clever> might have been the 19th, it was a monday
<samueldr> 19th was a monday, one message (and could be time-zoned differently)
ekleog has joined #nixos-aarch64
<samueldr> yay, disabling `-b` (batch operations) from mtools' mcopy fixes the fsck issues, spooky!
<samueldr> whew, bennofs[m], tilpner, #51158 should fix the FAT32 partition generation and give a clean FS
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51158 (by samueldr, 2 minutes ago, open): Fix FAT32 partition issues on sd-image-based images
joehh has joined #nixos-aarch64
<joehh> hello, didn't know this channel existsed - not sure if anyone has mentioned it, but the current image for aarch64 does not match the published sha256
<{^_^}> #51149 (by joehealy, 5 hours ago, open): aarch64 raspberry pi sha256 hashes do not match
<samueldr> hi! o/
<samueldr> as for the fat32 partition issue (unrelated to the unmatched sha256) #51158 fixes that
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51158 (by samueldr, 33 minutes ago, open): Fix FAT32 partition issues on sd-image-based images
<joehh> thanks for that - what is the significance of that for existing images I have
<joehh> I guess I can just copy the files off, format the partition properly and copy them back on
<joehh> would that make it all happy?
<clever> joehh: i think you just need to fsck them once
<clever> it looks like the fat32 is just "improperly umounted" when its baked into the image
joehh has quit [Ping timeout: 250 seconds]
orivej has joined #nixos-aarch64
clever has quit [Ping timeout: 252 seconds]
orivej has quit [Ping timeout: 246 seconds]
orivej has joined #nixos-aarch64
orivej has quit [Ping timeout: 244 seconds]
clever has joined #nixos-aarch64
clever has quit [Changing host]
clever has joined #nixos-aarch64
orivej has joined #nixos-aarch64
efraim has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
efraim has joined #nixos-aarch64
<bennofs[m]> clever: plain fsck makes things worse
joehh has joined #nixos-aarch64
joehh has quit [Ping timeout: 272 seconds]
<sphalerite> clever: infinisil is missing you in #nixos :p
globin has joined #nixos-aarch64
joehh has joined #nixos-aarch64
<samueldr> I don't know about plain fsck, but the fat32 wasn't "just improperly unmounted"; seems like `mcopy` from `mtools` actually is bad at writing FAT32
joehh has quit [Ping timeout: 246 seconds]
<clever> bennofs[m]: :O, how?
<clever> sphalerite: rejoin failed after internet failue
<clever> 2018-11-28 03:04:25 [freenode] -!- #nixos #nixos-unregistered Forwarding to another channel
<samueldr> clever: https://freenode.net/kb/answer/registration#logging-in your client probably /msg nickserv identify hunter2 to identify, which is racy
<samueldr> using the server password, or better yet, SASL, it shouldn't race
<clever> yep
<clever> i just havent bothered, because i dont reconnect that much
<bennofs[m]> clever: after fsck you get io errors and undeletable files
<samueldr> I think I can confirm, on my raspberry pi I'm left with trash on the boot partition
<samueldr> the partition sets itself read-only whenever I try to clean the kernels/initrd
<samueldr> which I'm 99% sure is related to the mcopy issue
<clever> sounds more like mtools is doing major corruption, and the fsck in the nix will cause the build to fail
<samueldr> ?
<samueldr> I'm confused by your statement
<bennofs[m]> samueldr: yes that's what happens after fsck. The trick is to remove the dir before fsck
<samueldr> I was confued by "the fsck in the nix will cause the build to fail"
<samueldr> confused*
<samueldr> since there is no fsck in the nix expressions, or rather, there wasn't one before I added one
<clever> samueldr: i'm thinkkkkkkkkkkkkkkking the fsck you added, will cause builds to fail if the fat32 is corrupted too far
<samueldr> exactly!
<clever> so its a block rather then a fix
<samueldr> it is to block production of broken images!
<clever> yep
<samueldr> there is *also* a fix, where it's not using the `-b` parameter of `mcopy`, which further breaks the imag
<samueldr> mtools 4.0.20 fixed the issue which caused the major issue, so there wasn't a need to add an additional fix
<samueldr> (it's been recently updated, not yet in -unstable, but present in master)
<gchristensen> should probably backport?
<samueldr> #51159 yes
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51159 (by samueldr, 13 hours ago, merged): [18.09] mtools 4.0.18 -> 4.0.21
<samueldr> I'm waiting for reviews on #51158 to backport the final fixes
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51158 (by samueldr, 13 hours ago, open): Fix FAT32 partition issues on sd-image-based images
<samueldr> ideally those who worked on `sd-image.nix`
<gchristensen> oops misread you
<samueldr> :)
<samueldr> I'm thinking maybe I misexplained the issue in the PR
<gchristensen> is mcopy -b not fixed with .20?
<samueldr> I was tired yesterday when doing the writeup
<samueldr> nope, `-b` is still broken
<gchristensen> nice.
<samueldr> not really :/
<gchristensen> no not at all
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51180 (by globin, 3 hours ago, open): [WIP] xorg: cross-fixes
Thra11 has joined #nixos-aarch64
orivej has quit [Ping timeout: 246 seconds]
Thra11 has quit [Ping timeout: 246 seconds]
orivej has joined #nixos-aarch64
Thra11 has joined #nixos-aarch64
orivej has quit [Ping timeout: 268 seconds]
<makefu> u
Thra11 has quit [Ping timeout: 244 seconds]
c00w has joined #nixos-aarch64
<samueldr> thanks Dezgeg