<clever>
avn: and zfs is overly cautious, and restarted the entire device each time
<clever>
avn: i also ran into the occasional write error/timeout during my initial resilver
<clever>
obviously, the magnetic couldnt keep up
<clever>
avn: it was splitting the reads 50/50 between the 2 good drives, the magnetic and the ssd
<clever>
avn: one oddity i noticed when doing the conversion, when i made it into a magnetic+ssd+ssd mirror (with 1 "bad" ssd having to resilver), it performed horibly
<clever>
avn: but this pool did start out on a smaller magnetic, i converted it into a magnetic+ssd mirror, then to a single ssd, then back to an ssd+ssd mirror
<clever>
avn: a pair of 240gig SSD's in a zfs mirror, no ZIL or L2
<clever>
avn: the system should have even less load then before now, but du refuses to get anywhere near its past numbers, its holding steady at 33sec, when it previously got 16sec
<clever>
that can make things simpler to edit
<clever>
tarinaky: try making your own configuration.nix, with imports = <nixpkgs/nixos/modules/installer/cd-dvd/installation-cd-ssh.nix>;
<clever>
tarinaky: how are you building the iso?
<clever>
avn: only way an L2 could help is if it was nvme possibly
<clever>
avn: no L2 right now, and the whole array is on an SSD
<clever>
avn: acording to arcstat.py, i'm getting hit rates as low as 9% still
<clever>
ok, now the program hogging 6gig of ram is 100% unable to touch the cpu
<clever>
avn: i just closed all of chromium to get that 3gig of ram free
<clever>
avn: because i still have a single program with 6gig in its RSS open
<clever>
avn: ok, now things are not making any sense, i have 3gig of ram free in "free -m", and the arc is using the same 3gig it had before, but the du command now takes twice as long
<clever>
gchristensen: what does "stat /nix/store/.links" say?
<clever>
gchristensen: due to --optimize, you will basicaly regain none of the free space, until the very last step where it cleans up the .links dir
<clever>
MichaelRaskin: creating ~20,000 files of ~500 bytes each
<clever>
MichaelRaskin: i have managed to kill btrfs with hydra before
<clever>
avn: yikes, almost all in kernel space!
<clever>
sys 2m8.274s
<clever>
real 2m30.022s
<clever>
sys 1m48.650s
<clever>
real 2m37.544s
<clever>
try manualy running a gc
<clever>
ah
<clever>
gchristensen: the symlinks are cheap, and getting under the radar
<clever>
gchristensen: maybe auto-optimize has reduced the usage of those so far, that its not triggering your auto-gc stuff
<clever>
gchristensen: looks like it needs some more aggresive GC'ing
<clever>
avn: and du finished after 3 minutes, with a solid 1m 23s of system usage
<clever>
replugging a keyboard twice made it recover
<clever>
unning
<clever>
avn: and it locked up
<clever>
avn: strain is back, 6gig of the 16gig of ram is in use by a single app, du is r
<clever>
i have done similar to prevent armv7 assembly from being used in armv6
<clever>
that can easily be added to nixpkgs
<clever>
when it clearly does exist
<clever>
so things like the console font utils, claim file not found
<clever>
and the secondary problem, a lot of programs dont correctly handle ESIZE from getdents()
<clever>
and if you happen to use that field in your applications binary files
<clever>
the problem, is that it changes the size of a field a lot of people assume is 32bits
<clever>
i forget which -D flag it was, but it switches on 64bit compat
<clever>
so you have to specialy compile the program to use the 64bit compat syscalls
<clever>
on a 32bit system, getdent() uses 32bit fields, and getdent64 uses 64bit fields
<clever>
on a 64bit system, getdent() (used to list files in a dir) uses 64bit fields all around
<clever>
slyfox: and in my case, i was nfs mounting the 4tb XFS on a 32bit raspberry pi
<clever>
slyfox: only when running a 32bit userland
<clever>
slyfox: and one issue i have ran into, is that data stored past the 2tb position on the block device, will have an inode number larger then 4 billion (32bits)
<clever>
slyfox: xfs does have inodes last i looked, but it allocates a lot more of them, and each bank of inodes is for data at a certain position in the disk
<clever>
time read dmis dhit dh% mrug mru mfu mfug mread c ph% pm% mm% miss hits mhit mh% mmis hit% eskip arcsz
<clever>
user 0m2.054s
<clever>
real 0m16.899s
<clever>
avn: this one finished in 16 seconds now that the arc has cached things
<clever>
[root@amd-nixos:/nix/store]# time du --max=1 --inodes -l | sort -n
<clever>
gchristensen: none of my other zfs machines do this, only the desktop
<clever>
why did ALL IO just halt?, and bring the cpu to the ground with it?
<clever>
why was it not just pushing junk into swap?
<clever>
which implies it only needed another ~1gig of ram to do the job
<clever>
avn: 55 seconds to run du on the store now, arc heaped at 3.5gig
<clever>
and the arc has increased in size to 2.6gig
<clever>
i have released the memory pressure (closed the 6gig app), and now du runs much faster
<clever>
alsa, vpn
<clever>
avn: "journalctl -f" was open for ~2 minutes, during 3 or 4 lockups, and the only thing it has is usb activity, and various other things timing out
<clever>
and the problem just ran away
<clever>
and one day, when the desktop locked up, i switched the keyb/mouse to the laptop to ease debuging
<clever>
avn: its sharing the keyboard/mouse between a desktop/laptop
<clever>
avn: main reason i discovered that usb helps, is because i have repurposed a box meant for sharing 1 usb printer to 4 pc's
<clever>
gchristensen: this will sort every storepath by how many inodes it contain
<clever>
[clever@laptop:/nix/store]$ du --max=1 --inodes | sort -n
<clever>
gchristensen: almost have the cmd ready
<clever>
replugging a keyboard brought it back
<clever>
attempting to open the journal pushed it over the edge
<clever>
that was about 7 minutes to run do over /nix/store
<clever>
gchristensen: and du has its first line of output!
<clever>
avn: it almost never recovers on its own
<clever>
avn: and replugging a usb keyboard instantly fixed it
<clever>
i have gone go bed, then woke up to find it was locked up for the last 5 hours
<clever>
and the machine locked up in the middle of typing that
<clever>
avn: top says du is using 76% of the cpu, it has not output a single line yet
<clever>
avn: the irc client is on a different machine, screen + ssh
<clever>
avn: i can barely even irc on that machine now, its freezing that badly
<clever>
gchristensen: so i'm also gathering data on my zfs problem at the same time
<clever>
gchristensen: trying to get a better cmd for your inodes, but /nix always cripples my system
<clever>
4.4.36
<clever>
thats the whole point of a mirror
<clever>
even if 1 hdd was bad, it should be able to run on the other
<clever>
avn: unplugging a usb keyboard instantly resumes all io and it continues like nothing happened
<clever>
avn: i ran this, arc is using 1.8gig, top says 300mb free, and all io just stops dead
<clever>
[root@amd-nixos:/nix]# du --max=1 --inodes
<clever>
avn: and the problem is recreated
<clever>
gchristensen: one sec
<clever>
the ext4 hydra is at 30%
<clever>
all of my ZFS boxes are at 1% inode usage
<clever>
avn: gc took 41 seconds, cant reproduce the issue at this instant
<clever>
hydra must have pushed more things over, GC found something to eat
<clever>
avn: system strain increased, 6gig in use by a single app, arc has gone down to 1.9gig
<clever>
viric: oh nice
<clever>
forgot about it this time
<clever>
df -i
<clever>
ive had that once or twice
<clever>
gchristensen: inodes?
<clever>
avn: putting some strain on the system now...
<clever>
avn: a second run of GC with no garbage took 32 seconds, and the arc rose to 3.1gig
<clever>
avn: under normal conditions, a full nix GC took 4 minutes 53 seconds, and the arc rose to a peak of 3.6gig, but has since dropped to 2.8gig
<clever>
avn: i have had it freeze all 8 cores at once before
<clever>
avn: leading to timeouts
<clever>
avn: my impression is that ZFS is spinlocking the cpu, causing it to not be capable of receiving a reply from the hdd
<clever>
avn: yeah, you can always try that first, thats what i was doing with the racklodge machine
<clever>
avn: ah, if you can get display during install, its trivial
<clever>
avn: the biggest problem i ran into, was keeping the hardware raid controller happy, and configuring the static ip right (the datacenter doesnt use dhcp right)
<clever>
tarinaky: read /etc/systemd/system/sshd.service to find its path
<clever>
keep reading until EOF
<clever>
yeah
<clever>
so commands like "/exec -o df -h" that run very fast and output a lot, just get silently truncated
<clever>
and dont attempt one last read of the stdout pipe
<clever>
people assume that waitpid/SIGCHILD means there is no more data
<clever>
gchristensen: ah, ive run into similar bugs in irssi with /exec -o
<clever>
why do i need nano installed?
<clever>
i do feel that the default systemPackages is a bit fat
<clever>
yeah, you can also foo = foo.override { netcat = pkgs.netcat-gnu; }; to undo it to some things
<clever>
maybe nothing will?
<clever>
just override netcat = pkgs.netcat-openbsd; and see what breaks?
<clever>
ah
<clever>
gchristensen: so you could just nix-env -iA nixos.netcat-openbsd and your done
<clever>
gchristensen: nix-env stuff does take priority over systemPackages
<clever>
you would have to clone nixpkgs and edit it locally
<clever>
yeah, no way to change that, i had to use a copy when building not-os
<clever>
what is it breaking?
<clever>
this is why i avoid installing such things globaly, always insert them into PATH and you get exactly the right version, without causing others trouble
<clever>
then prepend ${netcat-openbsd}/bin/ to the PATH of whatever cares
<clever>
so you can just make an override that sets netcat = pkgs.netcat-openbsd
<clever>
system-path just uses netcat
<clever>
for some reason, xorg refuses to listen on tcp now, even if you remote -nolisten tcp
<clever>
ive used socat before to convert /tmp/.X11-unix/X0 into a tcp socket, allowing remote X11 clients
<clever>
socat can handle unix sockets, and inter-mix things
<clever>
ive switched over to socat, it has a lot more options
<clever>
ah
<clever>
if you enable libvirtd, then the openbsd version is added to systemPackages
<clever>
yeah, runInLinuxVM is more for when you need root to loopback mount filesystems and grub-install
<clever>
testing this on my end...
<clever>
nix-repl> :b vmTools.runInLinuxVM hello
<clever>
Unode: i believe you can just run vmTools.runInLinuxVM on any derivation, and it will "just work", as long as the package is fine building as root, and qemu works
<clever>
Unode: prevm gets ran on the host, as nixbld1, the main body then gets ran as root under qemu, and then postvm gets ran on the host again as nixbld1