2017-01-28

<clever> madonius: the nix files always need attribute paths, and nothing else
<clever> madonius: have you tried pkgs.pass ?
<clever> from "nox pass"
<clever> 43 password-store-1.6.5 (nixos.pass)
<clever> bennofs: and sometimes i just dont commit for weeks, and then i commit all the changes everywhere and sync the machines up
<clever> bennofs: i usualy edit localy, and confirm things work before i commit
<clever> ...
<clever> modified: amd-nixos.nix
<clever> [root@amd-nixos:/run/current-system/nixcfg]# git status
<clever> bennofs: it also ate the .git directory as well, so i can "git diff" and see what i changed and didnt commit yet, in any build
<clever> this causes the entire config repo to be included inside the build
<clever> bennofs: ive done something similar, https://gist.github.com/cleverca22/d70a6826db80d0ee2ae9074d83690e98
<clever> bennofs: ah neat, so it can only upgrade if you -I nixpkgs=, and from then on, it sticks
<clever> that provides a default for $NIX_PATH
<clever> Ralith: there is nix.nixPath, but this takes effect AFTER the config is built, not during the build
<clever> it is nixpkgs that loads configuration.nix, so by the time its being read, its too late to change the nixpkgs
<clever> Ralith: you need to use -I nixpkgs= with nixos-rebuild
<clever> and then it can continue to analyze the problem, even when the system deadlocks
<clever> in theory, if i flag a program as real-time, and never yield the cpu, zfs cant spinlock the core and steal it
<clever> viric: something ive been thinking about recently, is to make a program that is designed to eat an entire core
<clever> ah
<clever> [225741.334066] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(V) show-blocked-tasks(w) dump-ftrace-buffer(z)
<clever> [root@amd-nixos:/nix/store]# dmesg | tail
<clever> [root@amd-nixos:/nix/store]# echo ? > /proc/sysrq-trigger
<clever> viric: you can also do it with sysrq, either show-blocked-tasks(w) or show-backtrace-all-active-cpus(l)
<clever> gchristensen: oh, hows the GC going?
<clever> ttuegel: yep, ive used that to add zfs support sometimes, but it can sometimes lead to downloading the world if you have changed the channel
<clever> but i have reused it for many things, like "nixos-rebuild build-vm" on gentoo
<clever> this guide was meant for jamming nix into another distro (a rescue shell) and then using it to nixos-install
<clever> at which point, you can pick any nixpkgs you want
<clever> ttuegel: this section explains how to get nixos-install and friends via nix-env -i
<clever> ttuegel: the linode guide could be abused, https://nixos.org/wiki/Install_NixOS_on_Linode
<clever> corngood: it can be interesting to just start at a file like this, and read every imports recursively
<clever> corngood: heh, i should try out line 30 of that file next time i do an install
<clever> 's
<clever> corngood: not on the ISO
<clever> corngood: ive had it work with just this
<clever> systemd.services.sshd.wantedBy = mkForce [ "multi-user.target" ];
<clever> avn: and zfs is overly cautious, and restarted the entire device each time
<clever> avn: i also ran into the occasional write error/timeout during my initial resilver
<clever> obviously, the magnetic couldnt keep up
<clever> avn: it was splitting the reads 50/50 between the 2 good drives, the magnetic and the ssd
<clever> avn: one oddity i noticed when doing the conversion, when i made it into a magnetic+ssd+ssd mirror (with 1 "bad" ssd having to resilver), it performed horibly
<clever> avn: but this pool did start out on a smaller magnetic, i converted it into a magnetic+ssd mirror, then to a single ssd, then back to an ssd+ssd mirror
<clever> avn: a pair of 240gig SSD's in a zfs mirror, no ZIL or L2
<clever> avn: the system should have even less load then before now, but du refuses to get anywhere near its past numbers, its holding steady at 33sec, when it previously got 16sec
<clever> that can make things simpler to edit
<clever> tarinaky: try making your own configuration.nix, with imports = <nixpkgs/nixos/modules/installer/cd-dvd/installation-cd-ssh.nix>;
<clever> tarinaky: how are you building the iso?
<clever> avn: only way an L2 could help is if it was nvme possibly
<clever> avn: no L2 right now, and the whole array is on an SSD
<clever> avn: acording to arcstat.py, i'm getting hit rates as low as 9% still
<clever> ok, now the program hogging 6gig of ram is 100% unable to touch the cpu
<clever> [root@amd-nixos:/proc/26667/task]# kill -stop *
<clever> avn: i just closed all of chromium to get that 3gig of ram free
<clever> avn: because i still have a single program with 6gig in its RSS open
<clever> avn: ok, now things are not making any sense, i have 3gig of ram free in "free -m", and the arc is using the same 3gig it had before, but the du command now takes twice as long
<clever> gchristensen: what does "stat /nix/store/.links" say?
<clever> gchristensen: due to --optimize, you will basicaly regain none of the free space, until the very last step where it cleans up the .links dir
<clever> MichaelRaskin: creating ~20,000 files of ~500 bytes each
<clever> MichaelRaskin: i have managed to kill btrfs with hydra before
<clever> avn: yikes, almost all in kernel space!
<clever> sys 2m8.274s
<clever> real 2m30.022s
<clever> sys 1m48.650s
<clever> real 2m37.544s
<clever> try manualy running a gc
<clever> ah
<clever> gchristensen: the symlinks are cheap, and getting under the radar
<clever> gchristensen: maybe auto-optimize has reduced the usage of those so far, that its not triggering your auto-gc stuff
<clever> gchristensen: looks like it needs some more aggresive GC'ing
<clever> avn: and du finished after 3 minutes, with a solid 1m 23s of system usage
<clever> replugging a keyboard twice made it recover
<clever> unning
<clever> avn: and it locked up
<clever> avn: strain is back, 6gig of the 16gig of ram is in use by a single app, du is r
<clever> i have done similar to prevent armv7 assembly from being used in armv6
<clever> that can easily be added to nixpkgs
<clever> when it clearly does exist
<clever> so things like the console font utils, claim file not found
<clever> and the secondary problem, a lot of programs dont correctly handle ESIZE from getdents()
<clever> and if you happen to use that field in your applications binary files
<clever> the problem, is that it changes the size of a field a lot of people assume is 32bits
<clever> i forget which -D flag it was, but it switches on 64bit compat
<clever> so you have to specialy compile the program to use the 64bit compat syscalls
<clever> on a 32bit system, getdent() uses 32bit fields, and getdent64 uses 64bit fields
<clever> on a 64bit system, getdent() (used to list files in a dir) uses 64bit fields all around
<clever> slyfox: and in my case, i was nfs mounting the 4tb XFS on a 32bit raspberry pi
<clever> slyfox: only when running a 32bit userland
<clever> slyfox: and one issue i have ran into, is that data stored past the 2tb position on the block device, will have an inode number larger then 4 billion (32bits)
<clever> slyfox: xfs does have inodes last i looked, but it allocates a lot more of them, and each bank of inodes is for data at a certain position in the disk
<clever> avn: restoring the strain on the system
<clever> 11:33:30 6.6K 227 6.4K 96 24 2.7K 3.7K 12 297 3.8G 29 70 35 266 6.4K 191 64 105 95 0 3.8G
<clever> time read dmis dhit dh% mrug mru mfu mfug mread c ph% pm% mm% miss hits mhit mh% mmis hit% eskip arcsz
<clever> user 0m2.054s
<clever> real 0m16.899s
<clever> avn: this one finished in 16 seconds now that the arc has cached things
<clever> [root@amd-nixos:/nix/store]# time du --max=1 --inodes -l | sort -n
<clever> gchristensen: none of my other zfs machines do this, only the desktop
<clever> why did ALL IO just halt?, and bring the cpu to the ground with it?
<clever> why was it not just pushing junk into swap?
<clever> which implies it only needed another ~1gig of ram to do the job
<clever> avn: 55 seconds to run du on the store now, arc heaped at 3.5gig
<clever> and the arc has increased in size to 2.6gig
<clever> i have released the memory pressure (closed the 6gig app), and now du runs much faster
<clever> alsa, vpn
<clever> avn: "journalctl -f" was open for ~2 minutes, during 3 or 4 lockups, and the only thing it has is usb activity, and various other things timing out
<clever> and the problem just ran away
<clever> and one day, when the desktop locked up, i switched the keyb/mouse to the laptop to ease debuging
<clever> avn: its sharing the keyboard/mouse between a desktop/laptop
<clever> avn: main reason i discovered that usb helps, is because i have repurposed a box meant for sharing 1 usb printer to 4 pc's
<clever> gchristensen: this will sort every storepath by how many inodes it contain
<clever> [clever@laptop:/nix/store]$ du --max=1 --inodes | sort -n
<clever> gchristensen: almost have the cmd ready
<clever> replugging a keyboard brought it back
<clever> attempting to open the journal pushed it over the edge
<clever> that was about 7 minutes to run do over /nix/store
<clever> gchristensen: and du has its first line of output!
<clever> avn: it almost never recovers on its own
<clever> avn: and replugging a usb keyboard instantly fixed it
<clever> i have gone go bed, then woke up to find it was locked up for the last 5 hours
<clever> and the machine locked up in the middle of typing that
<clever> avn: top says du is using 76% of the cpu, it has not output a single line yet
<clever> avn: the irc client is on a different machine, screen + ssh
<clever> avn: i can barely even irc on that machine now, its freezing that badly
<clever> gchristensen: so i'm also gathering data on my zfs problem at the same time
<clever> gchristensen: trying to get a better cmd for your inodes, but /nix always cripples my system
<clever> 4.4.36
<clever> thats the whole point of a mirror
<clever> even if 1 hdd was bad, it should be able to run on the other
<clever> avn: unplugging a usb keyboard instantly resumes all io and it continues like nothing happened
<clever> avn: i ran this, arc is using 1.8gig, top says 300mb free, and all io just stops dead
<clever> [root@amd-nixos:/nix]# du --max=1 --inodes
<clever> avn: and the problem is recreated
<clever> gchristensen: one sec
<clever> the ext4 hydra is at 30%
<clever> all of my ZFS boxes are at 1% inode usage
<clever> avn: gc took 41 seconds, cant reproduce the issue at this instant
<clever> hydra must have pushed more things over, GC found something to eat
<clever> avn: system strain increased, 6gig in use by a single app, arc has gone down to 1.9gig
<clever> viric: oh nice
<clever> forgot about it this time
<clever> df -i
<clever> ive had that once or twice
<clever> gchristensen: inodes?
<clever> avn: putting some strain on the system now...
<clever> avn: a second run of GC with no garbage took 32 seconds, and the arc rose to 3.1gig
<clever> avn: under normal conditions, a full nix GC took 4 minutes 53 seconds, and the arc rose to a peak of 3.6gig, but has since dropped to 2.8gig
<clever> avn: i have had it freeze all 8 cores at once before
<clever> avn: leading to timeouts
<clever> avn: my impression is that ZFS is spinlocking the cpu, causing it to not be capable of receiving a reply from the hdd
<clever> and default.nix refers to a directory that doesnt exist
<clever> none that i'm aware of
<clever> running nix-collect-garbage is causing upwards of 30k reads/sec in arcstat.py
<clever> currently, the arc is at 2.6gig, no heavy memory usage, running a GC
<clever> that is half of my zfs mirror
<clever> but now that i check dmesg, i do see some fishy things
<clever> [159790.096294] sd 4:0:0:0: [sdd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
<clever> i havent had anything like that in dmesg
<clever> avn: even if the arc size is the same, and there is another gig free
<clever> avn: something like updatedb takes under a minute if i have tons of ram free, but can take 10-15minutes if i have 6gig of ram actualy in-use
<clever> avn: the weird thing, is that the system just gives up and doesnt do any IO
<clever> avn: they typicaly recover if i replug a usb keyboard a couple times
<clever> avn: i'm also being plauged by frequent soft-lockups
<clever> gchristensen: is /tmp on a tmpfs?
<clever> it sucessfully boots under x86-qemu with testcases, and i have it acting as a nix build slave on a raspberry pi 3
<clever> this creates a distro that is massively stripped down, and in the current config, it compiles down to a ~47mb squashfs
<clever> unlmtd[m]: https://github.com/cleverca22/not-os/blob/master/default.nix#L12-L28 i used a small sub-set of nixos modules, and a few custom ones
<clever> unlmtd[m]: i have done something similar with not-os
<clever> so its up to you to make it boot, somehow
<clever> but it also wont include a kernel/initrd
<clever> if you set this to true, it will bypass the rootfs check for you
<clever> config = mkIf (!config.boot.isContainer) {
<clever> oh, and also
<clever> you would need to heavily modify stage-1 to bypass that
<clever> but that wont happen when using an initrd
<clever> but it sounds like you want the kernel to mount the rootfs for you
<clever> unlmtd[m]: simplest thing i can think of is to edit a copy of nixpkgs to just delete that check
<clever> unlmtd[m]: it is checking that at least one of the entries in the filesystems array has a mountpoint of /
<clever> modules/system/boot/stage-1.nix: message = "The ‘fileSystems’ option does not specify your root file system.";
<clever> unlmtd[m]: what is the exact error its giving?
<clever> unlmtd[m]: one min
<clever> unlmtd[m]: both datasets have mountpoint=legacy
<clever> unlmtd[m]: it needs something like this to tell it what zfs to mount where, and that it needs zfs tools in the initrd: https://gist.github.com/cleverca22/70a08dfbaf95bf34ffd515779be638eb
<clever> avn: yeah, you can always try that first, thats what i was doing with the racklodge machine
<clever> avn: ah, if you can get display during install, its trivial
<clever> avn: the biggest problem i ran into, was keeping the hardware raid controller happy, and configuring the static ip right (the datacenter doesnt use dhcp right)
<clever> avn: i have used this recently on a headless install: https://github.com/cleverca22/nix-tests/blob/master/kexec/configuration.nix
<clever> i have used this in my configuration to re-enable running on bootup
<clever> systemd.services.sshd.wantedBy = mkForce [ "multi-user.target" ];
<clever> you need to manualy start it with systemctl
<clever> sshd is configured to not run automaticaly
<clever> tarinaky: is this a normal nixos install or the ISO image?
<clever> "journalctl -f -u sshd", what do the logs say?
<clever> is the currently running sshd using that config file?
<clever> ps aux | grep sshd
<clever> you want to look at the current one, via the systemd unit file
<clever> because the /nix/store will contain dozens or more copies of sshd_config
<clever> ExecStart=/nix/store/ii0q2jbxzfgp3sw9mmk09cs1zykcnkma-openssh-7.3p1/bin/sshd -f /nix/store/z75qn23w186ihdy0w49w0yibc23gls5j-sshd_config
<clever> [clever@amd-nixos:~]$ grep sshd_config /etc/systemd/system/sshd.service
<clever> tarinaky: read /etc/systemd/system/sshd.service to find its path
<clever> keep reading until EOF
<clever> yeah
<clever> so commands like "/exec -o df -h" that run very fast and output a lot, just get silently truncated
<clever> and dont attempt one last read of the stdout pipe
<clever> people assume that waitpid/SIGCHILD means there is no more data
<clever> gchristensen: ah, ive run into similar bugs in irssi with /exec -o
<clever> why do i need nano installed?
<clever> i do feel that the default systemPackages is a bit fat
<clever> yeah, you can also foo = foo.override { netcat = pkgs.netcat-gnu; }; to undo it to some things
<clever> maybe nothing will?
<clever> just override netcat = pkgs.netcat-openbsd; and see what breaks?
<clever> ah
<clever> gchristensen: so you could just nix-env -iA nixos.netcat-openbsd and your done
<clever> gchristensen: nix-env stuff does take priority over systemPackages
<clever> you would have to clone nixpkgs and edit it locally
<clever> yeah, no way to change that, i had to use a copy when building not-os
<clever> what is it breaking?
<clever> this is why i avoid installing such things globaly, always insert them into PATH and you get exactly the right version, without causing others trouble
<clever> then prepend ${netcat-openbsd}/bin/ to the PATH of whatever cares
<clever> so you can just make an override that sets netcat = pkgs.netcat-openbsd
<clever> system-path just uses netcat
<clever> for some reason, xorg refuses to listen on tcp now, even if you remote -nolisten tcp
<clever> ive used socat before to convert /tmp/.X11-unix/X0 into a tcp socket, allowing remote X11 clients
<clever> socat can handle unix sockets, and inter-mix things
<clever> ive switched over to socat, it has a lot more options
<clever> ah
<clever> if you enable libvirtd, then the openbsd version is added to systemPackages
<clever> nixpkgs/nixos/modules/config/system-path.nix: pkgs.netcat
<clever> nixpkgs/nixos/modules/virtualisation/libvirtd.nix: [ pkgs.libvirt pkgs.netcat-openbsd ]
<clever> where does "which netcat" say it is?
<clever> how did you install it?
<clever> «derivation /nix/store/y69vxqggl2yc2bzlp5jzllbh4rf4gyq0-netcat-gnu-0.7.1.drv»
<clever> nix-repl> netcat-gnu
<clever> «derivation /nix/store/1q85vpfxsgla9wwg0h4kn8hkn2fk37il-netcat-openbsd-1.105.drv»
<clever> gchristensen: my desktop has version 1.105 in nixpkgs, not currently in $PATH
<clever> chrishill: and you will want to use the nixos-16.09 branch of this repo: https://github.com/NixOS/nixpkgs-channels/tree/nixos-16.09
<clever> if you make 135 match 129, then it will have support to do either 32 or 64bit
<clever> once that line has been edited
<clever> chrishill: nix-build ~/apps/nixpkgs/nixos/release.nix -A iso_graphical.i686-linux should build the 32bit graphical ISO
<clever> but if you have nix on a machine, you can just checkout a copy of 16.09 nixpkgs, edit that line, and have nix-build generate the ISO
<clever> yeah, graphical is only done for 64bit

2017-01-27

<clever> chpatrick: buildEnv is one option
<clever> musicmatze: double-check to see if anything nix related is still running
<clever> musicmatze: it may still be collecting, ive -9'd it many times without breakage
<clever> and almost anything with gui uses mesa
<clever> and udev is now part of systemd
<clever> mesa needs udev to identify your gpu
<clever> Unode: which channel is this on?
<clever> Unode: looks like its actualy building systemd
<clever> Unode: sounds like its trying to actualy run systemd things at build time, rather then just linking to it?

2017-01-26

<clever> Unode: not currently
<clever> yeah
<clever> Unode: yep
<clever> nh2_: ssl has never been so painless before!
<clever> nh2_: and after an accidental upgrade due to messing with nix-channel, and a nginx restart, the subdomain has ssl!
<clever> nh2_: about to add a subdomain to my site now
<clever> nh2_: same
<clever> i prefer to checkout the right branch of https://github.com/NixOS/nixpkgs-channels and then aim -I nixpkgs= at that
<clever> yeah, only real downside is that it may update without you noticing
<clever> now what? lol
<clever> press control+p for the previous page, IE tries to print it
<clever> press control+n for the next page, a new window opens in IE
<clever> i recently had the fun of trying to use a hardware raid controller via some active-x remote-console
<clever> this looks more like its scraping the GPU's framebuffer, and not actualy doing a serial console
<clever> chris|: proper serial console should manage the scrollback on the "local" end, so you can still scroll up after a crash like this
<clever> chris|: are you building with any parallelism enabled? how much?
<clever> chris|: no (remote) serial console available?
<clever> i'm guessing that nixops will respect $NIX_PATH or -I, so you can just point it to a copy of the latest nixos-16.09
<clever> you could delete it from nix-channel and it wont make any difference
<clever> i believe nixops will entirely ignore the channel on the target server
<clever> nh2_: and you shouldnt be using the nixpkgs channels for nixos stuff, it can potentialy brick the machine
<clever> it may also remove the ssh keys nixops uses to get in
<clever> nixops doesnt push the config file out, so the machine will probably revert itself back to a base template, and remove all of your services
<clever> nh2_: autoUpgrade breaks nixops stuff
<clever> nh2_: i think it just uses the channel from the host nixops was ran on, i havent looked into that side of nixops yet
<clever> nh2_: #letsencrypt might be able to help with that more
<clever> nh2_: there was an issue in one of the release channels, it worked for me on the latest nixos-16.09
<clever> nh2_: ive used it on my NAS and a website i recently setup
<clever> chris|: it should also show memory usage stats, and those may be of use
<clever> chris|: are you able to get access to the console output after the crash?
<clever> Unode: i had to :l <nixpkgs> first
<clever> Unode: https://gist.github.com/cleverca22/3cdf37acca84f701b17e4d6918763265 yep, it was that simple, but i havent seen your error, not sure if it will apply or not
<clever> but nix sandboxes are stopping you
<clever> yeah, runInLinuxVM is more for when you need root to loopback mount filesystems and grub-install
<clever> testing this on my end...
<clever> nix-repl> :b vmTools.runInLinuxVM hello
<clever> Unode: i believe you can just run vmTools.runInLinuxVM on any derivation, and it will "just work", as long as the package is fine building as root, and qemu works
<clever> Unode: prevm gets ran on the host, as nixbld1, the main body then gets ran as root under qemu, and then postvm gets ran on the host again as nixbld1