#nixos-dev on 2019-11-29

2019-10-09 18:48 samueldr changed the topic of #nixos-dev to: #nixos-dev NixOS Development (#nixos for questions) | NixOS 19.09 is released! https://discourse.nixos.org/t/nixos-19-09-release/4306 | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | https://r13y.com | 19.09 RMs: disasm, sphalerite | https://logs.nix.samueldr.com/nixos-dev

00:02 worldofpeace_ has joined #nixos-dev

00:07 drakonis has quit [Remote host closed the connection]

00:07 drakonis has joined #nixos-dev

00:15 <jtojnar> hmm, there is also SEML

00:20 <gchristensen> tomorrow is Office Hours again! it'll be a bit casual, if you have PRs you want to look at or questions about how to get involved, etc, come chat with worldofpeace_ and me =) https://discourse.nixos.org/t/nixos-office-hours-2019-11-29/4956

00:37 drakonis has quit [Read error: Connection reset by peer]

00:42 orivej has quit [Ping timeout: 276 seconds]

00:46 <jtojnar> I already have prior engagement for today’s evening, will definitely try to come for the next one

00:47 <gchristensen> no worries :)

00:57 <gchristensen> is it safe to assume /dev/disk/by-*/* will be symlinks to /dev/sdx[y]?

01:29 <worldofpeace_> jtojnar: please please for the next one :D we do soo much cool stuff together

01:30 ris has quit [Ping timeout: 265 seconds]

01:36 <gchristensen> +1!

01:37 <aanderse> speaking of cool stuff you do together... how is the pantheon desktop looking on nixos these days? :)

01:37 <aanderse> i've never used it but am interested in trying

01:40 niksnut has quit [Ping timeout: 240 seconds]

02:00 <gchristensen> unionfs is indeed making nixos-install take a long time on my images

02:02 <gchristensen> 400% CPU and 10 on iowait, and definitely not on the receiving end :P

02:23 das_j has quit [Remote host closed the connection]

02:23 Scriptkiddi has quit [Remote host closed the connection]

02:35 <worldofpeace_> aanderse: maintained very well by me.

02:36 <worldofpeace_> I even contribute upstream

02:37 <worldofpeace_> latest work is to support latest mutter api's, instead of using a compat package. https://github.com/elementary/greeter/commit/15c80e458856ba8640f1b52fc4a0adb7b72f44e2 #73906

02:37 <{^_^}> https://github.com/NixOS/nixpkgs/pull/73906 (by worldofpeace, 6 days ago, open): Pantheon use latest mutter

02:38 <aanderse> great! i look forward to trying it out then :-D

02:38 <worldofpeace_> but everything pretty much works, thanks to a lot of help from jtojnar.

02:57 <ashkitten> zzA

03:00 <gchristensen> hehe: init=/nix/store/p5hidl14l3s1z8rvigdjg37zrah9qx2i-nixos-system-install-environment-19.09pre-git/init initrd=initrd console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 console=ttyS1,115200n8 initrd=initrd loglevel=4

03:09 <gchristensen> I wonder if this will result in problems

03:10 <worldofpeace_> looks in nixos wiki for what you could be doing https://nixos.wiki/wiki/NixOS_on_ARM/Raspberry_Pi#Serial_console

03:10 <gchristensen> hehe

03:15 <gchristensen> I'm building one big netboot image which contains all the hardware support for a bunch of hardware classes by including their hardware.nix, which each individually set that console option

03:19 <worldofpeace_> lol, so it's only duplicated a few times. no sweat 😅

03:21 Scriptkiddi has joined #nixos-dev

03:21 das_j has joined #nixos-dev

03:22 <gchristensen> :P

03:22 <gchristensen> doesn't seem to cause trouble, so I'll just go for it

03:28 <samueldr> gchristensen: the log will output only on the last (or first?) console

03:28 <gchristensen> right on

03:29 <gchristensen> perfect :)

03:29 <samueldr> https://github.com/NixOS/nixpkgs/pull/42255

03:29 <{^_^}> #42255 (by dezgeg, 1 year ago, open): WIP: Log stage-{1,2} output to secondary consoles

03:41 orivej has joined #nixos-dev

03:44 worldofpeace_ has quit [Quit: worldofpeace_]

04:56 orivej has quit [Ping timeout: 250 seconds]

04:56 justanotheruser has quit [Ping timeout: 245 seconds]

05:01 orivej has joined #nixos-dev

07:15 justanotheruser has joined #nixos-dev

07:59 tilpner has quit [Quit: tilpner]

08:11 Jackneill has joined #nixos-dev

08:17 tilpner has joined #nixos-dev

08:53 niksnut has joined #nixos-dev

09:01 justanotheruser has quit [Ping timeout: 276 seconds]

09:05 FRidh has quit [Ping timeout: 240 seconds]

09:47 marek has quit [Ping timeout: 240 seconds]

09:49 ckauhaus has joined #nixos-dev

09:55 psyanticy has joined #nixos-dev

10:03 FRidh has joined #nixos-dev

10:05 FRidh has quit [Remote host closed the connection]

10:06 FRidh has joined #nixos-dev

10:07 justanotheruser has joined #nixos-dev

10:44 __monty__ has joined #nixos-dev

11:05 <domenkozar[m]> I think I can no longer recommend nixops

11:05 <domenkozar[m]> it's just broken software

11:06 <domenkozar[m]> supporting a few people shows that it's too fragile

11:07 <domenkozar[m]> and since the split that's going to get just worse

11:14 <adisbladis> domenkozar[m]: I've been dreaming of a nixops replacement that leverages terraform

11:17 <domenkozar[m]> polyrepo was the nail in the coffin

11:18 <domenkozar[m]> you take undermaintained and make it polyrepo

11:18 <domenkozar[m]> it's a recipe for disaster

11:18 <domenkozar[m]> software*

11:19 <domenkozar[m]> @adisbladis: yup :)

11:20 <adisbladis> We simply don't have the amount of people to support all the services out there

11:45 <domenkozar[m]> I think that's fine, the problem is we pretend that's not the case

11:45 <domenkozar[m]> too much complexity

11:45 <domenkozar[m]> that's why terraform is successful regardless of their resources

11:46 <clever> domenkozar[m]: what parts of nixops have you found are fragile?

12:11 orivej has quit [Ping timeout: 268 seconds]

12:20 <domenkozar[m]> it took like 5 fixes before hetzner would boot

12:21 <domenkozar[m]> beforehand it was at least nixpkgs<->nixops that was hard to keep in sync, across all nixpkgs branches

12:21 <domenkozar[m]> now that's so much worse

12:21 <domenkozar[m]> the underlying issues is that neither python nor nix have a good story for breaking changes

12:23 <clever> domenkozar[m]: i think a lot of that, is to blame on the hetzner bootstrap process, that tries to run nixos-install on ubuntu

12:23 <clever> https://github.com/NixOS/nixops/issues/1189 aims to entirely skip that problem, by just booting nixos via kexec

12:23 <{^_^}> nixops#1189 (by cleverca22, 15 weeks ago, open): plan for supporting custom partition layouts and custom FS's on any backend

12:23 <clever> then you nixos-install like normal, from a proper nixos host

13:03 orivej has joined #nixos-dev

13:03 <domenkozar[m]> clever: that's one area of the issues

13:45 phreedom_ has quit [Ping timeout: 260 seconds]

13:47 phreedom has joined #nixos-dev

14:17 orivej has quit [Ping timeout: 240 seconds]

14:53 das_j has quit [Remote host closed the connection]

14:53 Scriptkiddi has quit [Remote host closed the connection]

14:55 das_j has joined #nixos-dev

15:14 Cale has quit [Remote host closed the connection]

15:15 Cale has joined #nixos-dev

15:30 __monty__ has quit [Quit: leaving]

15:32 <Profpatsch> Okay, before I finally snap, let’s just add runCommandLocal finally https://github.com/NixOS/nixpkgs/pull/74642

15:32 <{^_^}> #74642 (by Profpatsch, 49 seconds ago, open): pkgs/build-support/trivial-builders: add runCommandLocal

15:35 Scriptkiddi has joined #nixos-dev

15:41 <clever> Profpatsch: only problem i can forsee with that, is that it complicates compiling for another platform

15:42 <clever> Profpatsch: i cant just realize a given path that hydra built, i am forced to ssh out to a build machine of the right arch

15:42 <clever> but thats an edgecase, where i would already be paying that cost for every single drv

15:43 <clever> so its a choice between always paying the narinfo lookup cost (remove allowSubstitutes) or sometimes paying the remote-build cost

15:52 <Profpatsch> clever: Hm, you mean when allowSubstitutes is set you can’t use it if you don’t have a builder for the arch

15:53 <Profpatsch> clever: If it’s already in the store, there is no substitution anyway. If you *could* subsitute it, you have to push on the builder, yes

15:54 <Profpatsch> But only if you either don’t build or it’s a different arch

16:08 orivej has joined #nixos-dev

16:15 <clever> Profpatsch: yeah

16:27 FRidh has quit [Quit: Konversation terminated!]

16:57 <clever> gchristensen: sure

16:58 <gchristensen> clever: so I can accept CPR instructions now and partition and mount disks based on the CPR data

16:58 <gchristensen> including ZFS on /

16:58 drakonis has joined #nixos-dev

16:59 <clever> gchristensen: do the machines have 2 or 3 disks?

17:00 <gchristensen> clever: one second, I'll get to that

17:02 <gchristensen> clever: https://gist.github.com/grahamc/b85725db4f90aa03004583f9bac56225 here are the files from /etc/nixos/packet/ does this look good to you? note: I plan on moving the users.users.root.openssh.authorizedKeys.keys definition to a different file (named credentials.nix or something) so it can be easily ignored

17:03 * clever looks

17:03 <clever> gchristensen: somewhat related, what is packet.net's policy on repairs when a hdd fails?

17:03 <gchristensen> they don't do them

17:04 <clever> just report it, destroy the instance, and re-spawn a new one?

17:04 <gchristensen> yep

17:04 <clever> no problems with by-id then

17:04 <clever> id normally prefer by-uuid, so when you clone the disk during repair, it can survive

17:04 <clever> but if you cant repair, it doesnt matter

17:04 <gchristensen> *SOMETIMES* they will do it, but they make no guarantees it'll work, and they only do it for customers who they know have very high technical competence and know what to do to make it work

17:04 <gchristensen> so in this case the by-id is sufficient

17:05 <clever> and with hdd swaps, there is always the risk of swapping the wrong device

17:05 <gchristensen> yeah, so they really just don't do them

17:06 <clever> gchristensen: will these machines always be efi, or are some legacy?

17:06 <gchristensen> almost all of them will be MBR

17:06 <gchristensen> c2.medium.x86 and their ARM machines are EFI, the rest are MBR

17:07 <clever> lrwxrwxrwx 1 root root 9 Nov 25 08:31 /dev/disk/by-id/wwn-0x5000cca33ad2b4c7 -> ../../sda

17:07 <clever> gchristensen: youll want a symlink like this in boot.loader.grub.devices, when on legacy

17:07 <gchristensen> yep, I'll get to that part :)

17:07 <clever> and if you have redundancy, i think you can use 2, but ive never tested that part of grub

17:07 <gchristensen> yep

17:07 <samueldr> (pedantic samueldr alert!) Legacy Boot*; MBR/GPT is the partitioning scheme, and both are likely to work on somewhat recent implementations of both EFI/Legacy boot firmwares :)

17:08 <gchristensen> with the exception of the authorizedKeys part, does this metadata layout for this system look good?

17:08 <clever> still reading over it

17:08 <clever> samueldr: yeah, ive seen efi on mbr work with the rpi&tianocore

17:08 <clever> and i often do legacy on gpt

17:09 <samueldr> the most important bit of that distinction is, rightly so, booting Legacy Boot GRUB on GPT; if you misuse "MBR boot" then it'll be assumed you don't need the BIOS Boot partition!

17:09 <clever> gchristensen: in slug.nix, i would rather you use mkMerge, that will give better error messages

17:09 <gchristensen> oh cool, can you show me an example of that?

17:10 <clever> gchristensen: if you want legacy on gpt, you must create a bios boot partition, no fs, not formated, not mounted, 1mb in size

17:10 <clever> gchristensen: when you boot.loader.grub.devices = "/dev/sda", grub will search for the bios boot partition, and automatically use it

17:10 <samueldr> (or was it mkMerge that he needed an example of?)

17:10 <gchristensen> clever: please focus on just this metadata for this one instance :)

17:11 <gchristensen> we'll get to MBR later

17:11 <clever> ah

17:12 <clever> gchristensen: { config = mkMerge [ { services.foo = ...; } { services.bar = ...; } ]; }

17:12 <gchristensen> cool

17:13 <clever> its pretty much identical to how your doing it with imports

17:13 <gchristensen> perfect

17:14 <clever> i think the module system will translate the above, into { config.services = mkMerge ( { foo = ...; } { bar = ...; } ); }

17:14 <clever> which lets lazyness figure out what config.environment is, and figure out services later

17:14 <clever> and it will recursively push it down, while keeping track of which file to blame for errors

17:15 <gchristensen> sounds good

17:15 <gchristensen> other than the mkMerge and moving the users.users.root.openssh.authorizedKeys.keys bit out of this file, looks good?

17:15 <clever> looking over it more...

17:15 <clever> nixpkgs.config.allowUnfree = true?

17:16 <gchristensen> needed for hardware.enableAllFirmware = true;

17:19 <clever> gchristensen: and why does it need all firmware?

17:19 <gchristensen> they may not, I'll make a note to look in to that

17:19 <clever> the rest of that gist all looks good

17:19 <gchristensen> cool, and matches your needs for automation?

17:21 <clever> gchristensen: i think so, the only thing missing is the choices for pool layout, which i think would be in the CPR data?

17:22 <gchristensen> right, we're not talking about that yet :)

17:26 <gchristensen> so for a MBR sernver I have this in the configuration: boot.loader.grub.devices = [ "/dev/disk/by-id/wwn-0x55cd2e414fbc4ee2-part1" ]; where part1 starts at sector 2048 and goes until sector 4096 (1M) and is the "MBR partition scheme" type. however, grub-install tells me: grub-install: error: unable to identify a filesystem in hostdisk//dev/sda; safety check can't be performed ... any thoughts?

17:26 <clever> gchristensen: nope, you must never point that to a partition

17:27 <gchristensen> ahh

17:27 <clever> it must be pointed to the root device, like sda or sdb

17:27 <clever> for MBR, grub will install a stub to sector 0, and then a stage 1.5 at sector 1024 i think (between sector 0 and partition 0)

17:27 <clever> for GPT, the tables get in the way, and they wanted to ban using "unused" space :P

17:28 <clever> so you must create a bios boot partition, and grub will lookup the offset on its own

17:28 <clever> in both cases, you point grub to sda, a stub goes into sector 0, and the stage 1.5 goes "somewhere else"

17:28 <clever> gpt just clearly defines that "somewhere else" as in use

17:29 <gchristensen> perfect

17:29 <gchristensen> this makes my code nicer too

17:29 <clever> the LBA offset of "somewhere else" is baked into the sector0 code, so it can be found without supporting the partition tables

17:30 <clever> and stage1.5 includes the grub kernel, and the grub modules required to mount /boot/

17:30 <clever> https://github.com/arvidjaar/bootinfoscript will parse all of those blobs, and explain the steps, as a debug tool

17:30 ckauhaus has quit [Quit: WeeChat 2.6]

17:31 <clever> https://gist.github.com/cleverca22/a4663462dfde1b7b48cb4ad3dd9aeb38#file-results-txt-L6-L21

17:31 <clever> sda here is an MBR disk, a stub is in sector 0, and stage 1.5 is directly after it at sector 1!! (my 1024 before was wrong)

17:32 <clever> stage 1.5 includes fshelp, ext2, part_msdos, and biosdisk, along with a string saying to mount (,msdos2)/grub (the 2nd partition on the same device as the current MBR)

17:32 <clever> by omiting the device, that drive can change to sdb or sdc, and still find the 2nd partition

17:33 <clever> if your doing devices = [ "sda" "sdb" ];, then grub will include a device, so the MBR on both disks, will look for a /boot on one of them

17:33 <clever> which creates a weak link in the redundancy

17:34 <clever> i'm not sure how to setup fully raid'd /boot, but bootinfoscript will help confirm if you did it right

17:34 <clever> the sdb in my above gist, is a gpt disk

17:34 <clever> Device Start End Sectors Size Type

17:34 <clever> /dev/sdb1 2048 67583 65536 32M BIOS boot

17:34 <clever> so now its using the offset defined in the partition tables (and i made that way too big)

17:35 <clever> that grub is not in use anymore, and it expects /boot/grub to be a directory on a zfs dataset called /root

17:35 <clever> (i dont remember even doing that? lol

17:36 <clever> ahh, sdb is from my old ssd+ssd mirror

17:38 drakonis has quit [Quit: WeeChat 2.6]

17:38 drakonis has joined #nixos-dev

17:41 drakonis has quit [Client Quit]

17:46 <clever> gchristensen: as for the specifics of the CPR data, it depends on how many disks are present, and choices the user will want to make

17:47 <clever> which will also impact the legacy grub config

17:47 <clever> oh, you can also do legacy+efi on gpt, for extra redundancy

17:47 <gchristensen> yeah, I think I accounted for the number of disks and a large number of choices

17:47 <clever> ive not tested it, but i suspect raid'ing and efi might also work, with some extra effort

17:47 orivej has quit [Ping timeout: 240 seconds]

17:48 <clever> some specific choices i'm thinking of

17:48 <gchristensen> I don't support that

17:48 <clever> 1 disk, no choice, just bpp+boot+swap+zfs

17:48 <gchristensen> https://gist.github.com/grahamc/52e80664eb5774f928f36c625c7370bc here is the CPR data for the c2.medium.x86 which had the .nix files I showed you

17:48 <clever> 2 disks, you can choose between redundancy or capacity+speed

17:49 drakonis has joined #nixos-dev

17:49 <clever> for zfs, your choices are mirror or not (just zpool create /dev/sda /dev/sdb)

17:49 <clever> for swap, you can either mdadm 2 partitions together (the system should survive a failure while on)

17:50 <clever> or swap could be 2 partitions, with the same priority set (it will crash upon loosing a disk, but can still boot)

17:50 <clever> 40 #{ device = "/dev/disk/by-partlabel/swap1"; priority = 10; }

17:50 <clever> 41 #{ device = "/dev/disk/by-partlabel/swap2"; priority = 10; }

17:50 <clever> this tells linux to stripe all swap activity between the 2 disks, so you get double the write rates

17:50 <gchristensen> neat

17:51 <clever> gchristensen: line 11 of that CPR data, is that /boot or bios-boot?

17:51 <gchristensen> "BIOS" in this case doesn't mean anything specific

17:51 <gchristensen> it is just what upstream does for whatever reason

17:52 <gchristensen> but this one is EFI, not MBR

17:52 <gchristensen> and you can correlate that partition number to https://gist.github.com/grahamc/52e80664eb5774f928f36c625c7370bc#file-cpr-json-L29-L31 to see it is /boot

17:52 <clever> oh, those are labels, not types

17:53 <clever> for efi, you have 2 more choices

17:53 <clever> choice a: /boot is vfat, it must be tagged as the efi system partition in gpt

17:53 <gchristensen> right

17:54 <clever> choice b: /boot can be anything (for zfs, id recomend an ext4 partition), and then /boot/ESP is the ESP and vfat

17:54 <clever> oops, thats /boot/EFI

17:54 <clever> boot.loader.efi.efiSysMountPoint = "/boot/EFI"; is required, to make choice b work

17:54 <gchristensen> how about /boot/efi

17:54 <clever> that also works

17:54 <clever> the path doesnt really matter

17:55 <clever> choice b also allows /boot to be on the same partition as /, but zfs complicates that

17:55 <gchristensen> I support either /boot or /boot/efi being the ESP partition

17:55 <clever> the benefit of choice b, is that only the efi binaries live on the vfat

17:55 <clever> the kernels and initrds can then live on ext4

17:55 <clever> so vfat can be much smaller, and it rarely changes, so you dont have to worry about vfat lacking a journal

17:56 <clever> another option, that i think some nixos users stumble into by accident, is a combination efi+legacy install

17:57 <clever> if you have both a bios-boot partion, and an efi-system partition, and you set both boot.loader.grub.efiSupport and boot.loader.grub.devices = [ "/dev/sda" ];

17:57 <clever> then grub installs both efi and legacy binaries to /boot, and configures both .efi executables and the MBR

17:57 <clever> then if efi ever borks, you can just legacy boot the same disk

17:58 <clever> but i dont know if packet.net lets you force legacy on an efi machine?

17:58 <clever> legacy is also x86 specific

17:59 <clever> if you have 3 disks, then your choices for zfs open up more

17:59 <gchristensen> here is an MBR boot device with the .nix files and the CPR data: https://gist.github.com/grahamc/ae368d5d159c0dda05ee065c01288fd5

17:59 <clever> you can either go with no redundancy, mirror, or raidzN

18:02 <clever> gchristensen: ext4, gpt? all looks good

18:02 <clever> i noticed a minor issue with the vdevs on the other one...

18:03 <clever> https://gist.github.com/grahamc/52e80664eb5774f928f36c625c7370bc#file-cpr-json-L62-L71

18:03 <clever> gchristensen: is this raid? mirror? jbod?

18:03 <gchristensen> "disk" means "stripe"

18:03 <clever> ah

18:03 <clever> and zfs doesnt really let you do concat, it will just stripe if you dont specify a type

18:03 <gchristensen> you can add any of the vdev types describen in `zpool` under "Virtual Devices (vdevs)"

18:04 <gchristensen> in man zpool*

18:04 <clever> i also notice, your using sda for some, and sdd3 for others

18:04 <clever> that has a minor impact on performance

18:05 <gchristensen> I don't care, that is just a demo

18:05 <gchristensen> it is up to you to put in sensible data

18:05 <clever> ive yet to measure how much though

18:05 <clever> yeah, this allows the user to choose those things

18:05 <clever> only other thing i can think of thats missing, is device and pool attributes

18:05 <gchristensen> it isn't :)

18:05 <gchristensen> https://gist.github.com/grahamc/52e80664eb5774f928f36c625c7370bc#file-cpr-json-L60-L61

18:06 <clever> ah, that gets pool properties, but what about dataset properties?

18:06 <clever> oh there

18:06 <clever> duh!

18:06 <gchristensen> https://gist.github.com/grahamc/52e80664eb5774f928f36c625c7370bc#file-cpr-json-L75-L76

18:06 <clever> that just leaves one last thing then, lol

18:06 <gchristensen> ok

18:06 <clever> sometimes, i will install with dedup=on, for the entire pool

18:06 <clever> but once installed, i switch to dedup=off

18:07 <gchristensen> hehe

18:07 <clever> that will reduce the disk usage of the initial install, but eliminate the performance cost of running with dedup

18:07 <clever> the only cost remaining, is when deleting blocks created during the install

18:07 <gchristensen> I'm not going to support changing options after install for now

18:07 <gchristensen> you can do that when you first SSH in :)

18:08 <clever> yeah, nixops could always do that step

18:08 <gchristensen> ok, here is what I need from you now

18:08 <clever> the property fields let nixops specify initial values, and once created, nixops can change to secondary ones

18:08 <gchristensen> I need a CPR document which describes a system you'd actually use

18:08 <gchristensen> to be certain it actually works

18:08 * clever edits

18:09 <clever> i'll write one for some dual-disk systems

18:10 <gchristensen> https://gist.github.com/grahamc/52e80664eb5774f928f36c625c7370bc#file-cpr-json-L57 this one is from a c2.medium.x86 which I believe is the system you're wanting to use?

18:11 <clever> already using that one as the example

18:11 <gchristensen> cool

18:11 <clever> oh, what about discard stuff

18:11 <gchristensen> what about it?

18:12 <clever> i'm thinking, an option in the json, to make blkdiscard get ran on every disk, before it partitions

18:12 <gchristensen> it does

18:12 <gchristensen> every disk is completely discarded by the time my stuff runs

18:12 <clever> perfect

18:15 <clever> gchristensen: any particular reason you used sdd for the boot disk?

18:15 <gchristensen> that is what Packet does out of th ebox

18:15 <clever> ah

18:15 <gchristensen> clever: the rules around boot devices are: if EFI: there must be single vfat filesystem mounted at /boot or /boot/efi and it must not be RAID-backed, ... if MBR: at least one disk must have a partiton with the label "BIOS", and every disk which has a partition labeled "BIOS" is added as a grub device

18:16 <gchristensen> that is, those are the rules my tooling checks for

18:16 <clever> ahh

18:31 <clever> still writing...

18:31 <gchristensen> cool

18:37 <clever> its going to be over-engineered as heck :P

18:39 <gchristensen> oh dear

18:46 <clever> gchristensen: disks array is done, on to filesystems...

18:46 <gchristensen> :)

18:50 <edef> do we have a policy for external completions addon things?

18:50 <edef> like, i'm packaging the gcloud completions for fish

18:52 <clever> gchristensen: do we know the block size for these drives?

18:53 <gchristensen> they're probably enterprise SSDs

18:55 ris has joined #nixos-dev

18:55 <gchristensen> that is all I know though :)

18:55 justanotheruser has quit [Ping timeout: 245 seconds]

18:55 <clever> gchristensen: `fdisk -l /dev/sda` can reveal that

18:55 <clever> Sector size (logical/physical): 512 bytes / 512 bytes

18:55 <clever> I/O size (minimum/optimal): 512 bytes / 512 bytes

18:55 <edef> that's always 512 bytes

18:55 <edef> or 4k

18:56 <gchristensen> I think fdisk will always report 512

18:56 <edef> my NVMe SSD shows 512 bytes

18:56 <gchristensen> exactly, disks usually lie

18:56 <clever> only because crappy OS's fail hard when the disk tells the truth

18:56 <edef> i'm pretty sure that my other samsung SSDs show the same value

18:56 <clever> Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors

18:57 <edef> even though the truth is a multiple of 3

18:57 <clever> Sector size (logical/physical): 512 bytes / 4096 bytes

18:57 <clever> I/O size (minimum/optimal): 4096 bytes / 4096 bytes

18:57 <clever> one of my drives tells the truth!

18:57 <edef> yes, that's 4k because it's AF

18:57 <clever> AF?

18:57 <edef> but the only options are 512 and 4k

18:57 <edef> Advanced Format

18:57 <clever> ah

18:57 <clever> nothing is ever bigger then 4k?

18:57 <edef> not unless you have very specialised drives

19:00 <clever> Profpatsch: and now i need runCommandLocal, lol!

19:00 <gchristensen> clever: I am concernedabout what you're creating :P

19:00 justanotheruser has joined #nixos-dev

19:01 <gchristensen> btw CPR is available when you have reserved the instance, so you would know if you drive was 4k or not because you can craft the CPR data for that host

19:01 <clever> gchristensen: datasets are now defined

19:01 <clever> gchristensen: but i might need to boot something on the instance to inspect it? and then re-run CPR against it afterwards

19:02 <clever> gchristensen: though, your tool could support this specific flag on its own

19:02 <gchristensen> you would need to inspect the host to learn about it yes

19:02 <gchristensen> that is why CPR is only available on reserved hw :)

19:02 <clever> ashift, must be set to N, where the block size is 2^N

19:02 <clever> ahh

19:02 <clever> i was thinking you could just restrict cpr to fit within a template

19:03 <clever> i want a machine with 2 disks and _ cpu, here is a CPR template that would work on any 2 disk machine

19:03 <gchristensen> yea

19:03 <gchristensen> they just don't do that right now

19:07 <clever> cpr_zfs.pools and .datasets are done

19:08 <clever> i think that just leaves mounts

19:10 <clever> gchristensen: i think its done?, giving it a read-only

19:10 <clever> over*

19:11 <gchristensen> cool

19:11 <clever> currently, its efi only

19:12 <clever> and swap could be handled differently, but thats an experiment for later

19:13 <clever> gchristensen: https://gist.github.com/43f098096007c332a3f1409a8b38bafa

19:13 <clever> hows that look?

19:15 <gchristensen> clever: property values must all be strings

19:16 <ma27[m]> is the `-unstable-` infix supposed to be part of `pname` or `version`? (IIRC this has been mentioned in nixos office hours some time ago and it was suggested to do `pname = "pkg-unstable";`, but I'm not 100% sure anymore)

19:16 <clever> gchristensen: gist updated

19:16 <ma27[m]> (context: https://github.com/NixOS/nixpkgs/pull/74646#discussion_r352218759)

19:17 <gchristensen> clever: the very top of the file should be { "customdata": { .... and then this line should be deleted https://gist.github.com/cleverca22/43f098096007c332a3f1409a8b38bafa#file-build1-json-L62

19:18 <clever> $ jq < build1.json '.customdata.cpr_storage.disks[0].device'

19:18 <clever> "/dev/sdd"

19:18 <clever> gchristensen: thats just the sorting of the fields

19:18 <clever> let me double-check against yours...

19:19 <clever> $ jq < 52e80664eb5774f928f36c625c7370bc/cpr.json '.customdata.cpr_storage.disks[0].device'

19:19 <clever> "/dev/sdd"

19:19 <clever> gchristensen: yep, the customdata is at the right point in mine

19:19 <gchristensen> no it isn't :P

19:19 <clever> it is at the root of the json object

19:19 <gchristensen> { customdata = { cpr_zfs = ...; cpr_store = ...; } }

19:20 <clever> oh, cpr_zfs is at the wrong point?

19:20 * clever compares

19:20 <gchristensen> https://gist.github.com/cleverca22/43f098096007c332a3f1409a8b38bafa#file-build1-json-L2

19:20 <clever> i see

19:20 <clever> cpr_zfs is a child of customdata, not a sibling

19:20 <gchristensen> yep

19:20 <clever> the indenting was hard to follow in your json

19:20 <gchristensen> aye

19:21 <gchristensen> let me know when it is updated and I'll try this

19:21 <clever> min-free strikes again

19:22 <clever> gchristensen: gist updated

19:23 <clever> gchristensen: this also defines 2 json files, for raidz1 and disk arrangements

19:24 <gchristensen> yeah you could do raidz1 in there

19:24 <gchristensen> the format is { "vdevtype": [ "disk", "disk", "disk" ] }

19:24 <clever> hydra is using raidz1 so the database isnt lost upon failure

19:25 <clever> while the build machines, who cares

19:25 <gchristensen> and it accepts a list of these vdevtype objects

19:34 <gchristensen> ah dang clever you can't mkMerge a function

19:35 <clever> the function args should be one level up

19:35 drakonis has quit [Ping timeout: 265 seconds]

19:35 <gchristensen> can't

19:35 <clever> and passed down as normal vars

19:35 <gchristensen> that slug.nix file is generated from other files

19:36 <clever> just leave it as imports then for now, and it can be delt with better at a later time

19:37 <gchristensen> yep

19:38 <edef> clever: note that there is ashift autodetection

19:39 <edef> clever: it hardcodes some device IDs

19:39 <clever> edef: ive not heard of that, ahh, so its not as smart as querying the disk directly

19:39 <edef> clever: disks lie

19:39 <clever> yeah

19:40 <edef> clever: https://github.com/zfsonlinux/zfs/blob/a7c358845b1fdfc60b5f1f70d9d6a4ab87f95fa4/cmd/zpool/os/linux/zpool_vdev_os.c#L98-L103

19:40 <clever> i would both query, and hard-code a list of overrides

19:40 <clever> for the liers

19:41 <clever> i also see some 8k's in that list!

19:44 <edef> i don't think they report that

19:44 <edef> the 860s definitely have page sizes that are multiples of 3

19:50 drakonis has joined #nixos-dev

20:17 psyanticy has quit [Quit: Connection closed for inactivity]

20:28 drakonis has quit [Ping timeout: 265 seconds]

20:43 <niksnut> gchristensen: hm, looks like some build machines have the wrong time: https://nix-cache.s3.amazonaws.com/log/f45iziab6xhwfq9c7qphmx422y6hlkkr-nix-2.4pre7116_f102d793.drv

20:44 <gchristensen> how do I view that?

21:01 justanotheruser has quit [Ping timeout: 246 seconds]

21:01 <gchristensen> niksnut: what makes you think so?

21:01 <gchristensen> I can't tell

21:04 MichaelRaskin has joined #nixos-dev

21:09 <niksnut> sorry, I restarted the build so the log is gone

21:09 <gchristensen> I did manage to see the log

21:09 <gchristensen> but I didn't see anything susipcious

21:13 <gchristensen> clever: ping

21:14 <clever> gchristensen: pong

21:14 <gchristensen> clever: ssh -o UserKnownHostsFile=/dev/null root@147.75.75.222

21:14 <gchristensen> please validate the config and disks are right

21:19 <clever> gchristensen: almost done...

21:20 <clever> efi vars are a little odd, but it boots so it doesnt matter much

21:20 <gchristensen> uh oh

21:20 <gchristensen> the aarch64 build box is a bit busted

21:21 <gchristensen> it is asking for a password at boot time :o

21:23 <samueldr> uh oh!

21:23 <clever> gchristensen: minor nit-pick, boot-phone-home persists in the config after installation

21:23 <gchristensen> yep

21:23 <gchristensen> I know

21:23 <clever> gchristensen: you could delete that file (and remove it from packet.nix) after you nixos-install

21:24 <clever> so it will apply on bootup, but not persist after a nixos-rebuild

21:24 <gchristensen> it does that by default on the installatino

21:24 <gchristensen> but this one I had to hand fix becaus eof the lib thing

21:24 <clever> ah

21:24 <gchristensen> so don't worry about this one :P

21:25 <clever> slug.nix also doenst contain any functions, this time

21:25 <gchristensen> I know

21:25 <gchristensen> I fixed it by hand

21:25 <clever> ahh

21:26 <clever> gchristensen: everything looks perfect for a build-machine (builder1.json)

21:26 <gchristensen> perfect

21:26 <clever> gchristensen: oh, i just noticed, the disks are mis-matched in sizes

21:27 <clever> 447 + 447 + 223 + 223

21:27 <gchristensen> yep

21:27 <clever> that will cause problems with the raidz1 file

21:27 <clever> mostly wasted space

21:27 <gchristensen> you'll need to account for that, yeah

21:27 <gchristensen> could do 2 mirrors

21:27 <clever> raidz1 is pointless if you dont have 3 disks of equal size

21:27 <clever> so 2 mirrors would be better, for that hardware

21:28 <clever> how predictable are factors like disk size and number of disks?

21:29 <gchristensen> since it is on reserved hardware, 100% ;)

21:30 <gchristensen> I can't say any more details than that, using CPr on unreserved hardware is 100% unsupported

21:30 <clever> more about what i'll get when asking for a reserved box for example?

21:30 <clever> or would i just say what i want on the reserved box?

21:31 <gchristensen> you'll get at least what is here: https://www.packet.com/cloud/servers/c2-medium-epyc/

21:31 <clever> ah, they are calling it "boot" and "storage" drives

21:31 <gchristensen> yeah

21:32 <clever> i tend to just lump everything into a single pool

21:32 <clever> so mis-matched sizes like that mess up my plans

21:33 <clever> checking the disks on this box closer..

21:36 justanotheruser has joined #nixos-dev

21:49 <clever> gchristensen: feel free to terminate that box whenever you want

21:50 thonkpod has joined #nixos-dev

21:51 <gchristensen> thanks

22:22 <clever> gchristensen: one solution i can think of for slug.nix, is to bring hnix into the mix

22:22 <clever> it may be a simple matter to parse a list of nixos modules, merge them together into one, and spit it back out

22:22 <clever> then it doesnt even need mkMerge

22:23 <clever> it would primarily target simple { ... }: { <stuff> } things, and not allow let blocks

22:26 <gchristensen> that would be fine

22:26 <gchristensen> these are all calculated at build time, not during the installation -- so adding hnix wouldn't impact the install environment at all

22:27 <clever> the tool could also be a fairly small binary, and kept in a binary cache

22:29 <gchristensen> adisbladis and I were wondering why mksquashfs is *so slow* earlier

22:32 <gchristensen> somebody make a mksquashfs but fast :P

22:32 <adisbladis> If I was the maintainer I'd just make it fast

22:33 <gchristensen> good call

22:36 <MichaelRaskin> Just make it fast means that generated images are allowed to be incorrect, right?

22:39 <gchristensen> I'm pretty sure mksquashfs is slower than it needs to be

22:43 <MichaelRaskin> Well, you said _but_ fast, I was replying to _just_ fast

22:43 <MichaelRaskin> (not sure what compression level squashfs achieves, maybe compression level it uses is actually expensive)

22:47 <clever> gchristensen: mksquashfs is very good at using many cores, when compressing

22:48 <clever> 8 comp ? "xz -Xdict-size 100%"

22:48 <clever> 25 mksquashfs nix-path-registration $(cat $closureInfo/store-paths) $out \

22:48 <clever> 26 -keep-as-directory -all-root -b 1048576 -comp ${comp}

22:48 <gchristensen> yeah Filesystem size 472322.35 Kbytes (461.25 Mbytes)

22:48 <gchristensen> 31.60% of uncompressed filesystem size (1494488.69 Kbytes)

22:48 <gchristensen> but even with 96 cores it takes several minutes

22:48 <clever> 7 # For zstd compression you can use "zstd -Xcompression-level 6".

22:49 <clever> select COMPRESSION compression. Compressors available: gzip (default), lzma (no kernel support), lzo, lz4 and xz.

22:50 <clever> -b BLOCK_SIZE

22:50 <clever> Use DICT_SIZE as the XZ dictionary size. The dictionary size can be specified as a percentage of the block size, or as an absolute value. The dictionary size must be less than or equal to the block size and 8192 bytes or larger. It must also be storable in the xz header as either 2^n or as 2^n+2^(n+1). Example dict-sizes are 75%, 50%, 37.5%, 25%, or 32K, 16K, 8K etc.

22:52 <clever> gchristensen: interesting, there is an `-Xbcj arm` "filter" that does .... something

22:52 <clever> https://www.mankier.com/1/mksquashfs

22:53 <clever> https://www.slax.org/blog/18663-XZ-compression-filters

22:54 <clever> gchristensen: oh god no, dont use bcj on anything nix is involved in! lol

22:54 <clever> gchristensen: it mutates assembly code to make it more compression friendly!

22:54 <gchristensen> nonono

22:54 <clever> say goodbye to all your hashes and validation :P

22:55 janneke_ has joined #nixos-dev

22:55 janneke has quit [Ping timeout: 240 seconds]

22:56 <clever> gchristensen: so, you can choose between xz, lz4, zstd, lzo, lzma, and gzip...

22:56 <clever> https://www.mankier.com/1/mksquashfs#Options-Compressors_available_and_compressor_specific_options

22:59 <clever> gchristensen: ive also only used mksquashfs on machines of up to 8 cores

23:10 xvapx has joined #nixos-dev

23:11 xvapx has quit [Client Quit]