#nixos-dev on 2017-11-25

00:03 <orivej> ekleog: you may try e.g. "sysdig evt.type=write" (with "programs.sysdig.enable = true;") to see what's going on.

00:15 <ekleog> ok, so actually it looks like journalctl is actually the source of all this spam (with 80% certitude): I just ran a strace -p $(pidof systemd-journald), and a *lot* of information strolls by, much more than 33KB in 10s -- after thinking about it, as these 2KB were computed by du -hs /var/log/journald, journald's log rotation must have limited the increase in folder size... (also for sysdig, seeing it

00:15 <ekleog> requires a kernel module doesn't make me want to try it without much further investigation, which won't come before long)

00:21 <ekleog> hmm, are you actually sure about nix not storing substituted dependencies? I've got as many .drv's for firefox-unwrapped as I have firefox-unwrapped's in my nix-store, and think I'd have noticed if I actually did rebuilt firefox every single time

00:21 <ekleog> s/dependencies/.drv's/

00:37 <aszlig> orivej: https://github.com/NixOS/nix/pull/1699

00:37 <aszlig> also, niksnut: ^

02:00 ris has quit [(Ping timeout: 248 seconds)]

02:07 _ts_ has quit [(Ping timeout: 248 seconds)]

02:21 orivej has quit [(Ping timeout: 260 seconds)]

02:28 orivej has joined joined #nixos-dev

02:57 mbrgm has quit [(Ping timeout: 248 seconds)]

02:59 mbrgm has joined joined #nixos-dev

04:41 vcunat has joined joined #nixos-dev

04:43 <vcunat> apparently some closure size increased in staging, and that broke one installer test (the system no longer fits) https://hydra.nixos.org/build/64709761

05:06 taktoa has joined joined #nixos-dev

07:50 ma27 has joined joined #nixos-dev

07:50 ma27 has quit [(Client Quit)]

08:04 FRidh has joined joined #nixos-dev

08:07 <vcunat> well, pushed 474c1ce79 for that, at least

08:17 phreedom has quit [(Quit: No Ping reply in 180 seconds.)]

08:20 phreedom has joined joined #nixos-dev

08:58 zraexy has quit [(Ping timeout: 255 seconds)]

08:58 zraexy has joined joined #nixos-dev

09:37 <orivej> vcunat: looking at this closure size graph it seems almost impossible for the closure size spike to be caused by anything other than fwupd being enabled by default:

09:37 <orivej> https://hydra.nixos.org/job/nixos/trunk-combined/nixos.closures.smallContainer.x86_64-linux#tabs-charts (click the rightmost dots, than "changes" to see git diff)

09:40 <vcunat> orivej: right, the spike went down now

09:41 <vcunat> it has a nice closure: two pythons, two perls, two gtk+...

09:42 <orivej> if the nixos installer test is the most obvious way to notice such things, and it is good to notice them early, maybe you should revert 474c1ce79?

09:42 <vcunat> (meaning fwupd)

09:43 <orivej> :)

09:44 <vcunat> I'm considering that

09:44 <vcunat> I think I would prefer to separate such tests into different jobs

09:45 <vcunat> when you look at the red/green/... icons at Hydra, it's only confusing

09:46 <vcunat> that installation with SW RAID breaks

09:46 <vcunat> when it's "only" closure size increase

09:49 <vcunat> ping niksnut for that, as he's sensitive to closure blowups

09:50 <vcunat> It might be just a single job that measures closures of various systems and packages (individually) and compares them to some hardcoded thresholds to either succeed or print some informative warning.

09:52 <vcunat> And add some changes to make it easy to diff two closures in terms of sizes, e.g. name-sorted list from nix-store -qR. (And then you can e.g. use `nix why-depends` if it's something new.)

10:02 taktoa has quit [(Remote host closed the connection)]

10:13 _ts_ has joined joined #nixos-dev

10:33 ris has joined joined #nixos-dev

10:40 <orivej> are there any Hydra branches for "staging-17.09", or should I commit a mass rebuild straight to "release-17.09"?

10:40 <orivej> err, Hydra jobsets

11:08 <FRidh> orivej: there is no job for staging-17.09. That one was primarily used before the release. You can indeed push directly to release.

11:50 <vcunat> +1

11:59 <vcunat> :-) the four-headed merge

12:00 <vcunat> Thanks for keeping the first-parent line on master.

12:02 <gchristensen> can IFD be turned off in nix 1.11?

12:03 <vcunat> gchristensen: what is IFD?

12:04 <gchristensen> ack Import From Derivation but what I meant was building during evaluation

12:04 <vcunat> oh, yes, I believe so

12:05 <gchristensen> so I can error on https://github.com/NixOS/nixpkgs/pull/30252/files#diff-397a880dff218df0de9c5e8142973d0fR12

12:05 <vcunat> but Eelco claimed that is (supposed to be) done on Hydra, so there's certainly something wrong

12:06 <vcunat> and for what should change on Hydra, I'd start with --cores != 1 on aarch64

12:06 <gchristensen> that is on me :(

12:06 <vcunat> if it's so the whole time, I'm amazed noone has noticed

12:06 <vcunat> (until now)

12:06 <gchristensen> yeah, the whole time :)

12:08 <gchristensen> hmm it might be challenging to fix this

12:08 <FRidh> vcunat: yea I accidentally pushed a merge to master instead of to staging

12:08 <FRidh> do indeed try to keep aware of using --no-ff ;)

12:08 <Dezgeg> wow

12:08 <aszlig> gchristensen: maybe go with --readonly-mode?

12:09 <Dezgeg> no wonder why there are so many > 10h timeouts on aarch64 then :)

12:09 <vcunat> I thought we were just overloading it too much

12:09 <vcunat> it never occurred to me, and it's written in every log

12:09 <aszlig> i mean, it doesn't turn off IFD entirely but if you don't have the path to be imported from in your store the build will fail

12:11 <gchristensen> Dezgeg: we may need to make a new t2a host

12:12 <vcunat> aszlig: oh, that's... nondeterminstic :'(

12:12 <Dezgeg> why a new one?

12:12 <vcunat> some that doesn't have --cores 1

12:13 <Dezgeg> isn't that tweakable from hydra?

12:13 <gchristensen> a bit ago I rm -rf 'd my ~ and maay have lost access to the arm one

12:16 <aszlig> orivej: your restore commit still doesn't really restore that much :-/

12:16 <orivej> hm?

12:18 <aszlig> git diff 88eea6947fd4e7e2b92d4e778a6ab3ed66761ae6^ fbbda41e05d89325baaefef6fd0a420a1c365d7e

12:20 <vcunat> FRidh, orivej: so we leave the mass rebuild on master?

12:20 <vcunat> Estimating rebuild amount by counting changed Hydra jobs.

12:20 <vcunat> 12337 x86_64-darwin

12:20 <vcunat> 18846 x86_64-linux

12:21 <vcunat> I think it would be more practical to revert it for now, for anyone developing against master.

12:22 <FRidh> vcunat: isnt that mostly desktop environments, long builds, and the long tail of "insignificant" packages?

12:23 <vcunat> It's almost everything.

12:23 <orivej> but some of this stuff has already been built in the staging jobset. could you start nixpkgs/trunk evaluation to see what is left and how much is broken?

12:23 <FRidh> some of it has been build in python-unstable branch, which was build on staging

12:23 <FRidh> *based on

12:23 <vcunat> ah, right, bad diff

12:25 <aszlig> i have the revert commits ready, should i push?

12:25 <aszlig> https://gist.github.com/aszlig/20e76631913dca96d7d76b6134f5cfbd

12:25 <vcunat> Looking at staging Hydra, it's ~22k builds, summed over all platforms.

12:26 <vcunat> Mostly darwin, I think.

12:26 <vcunat> I don't have a strong opinion on this.

12:29 <vcunat> (We updated all the channels today.)

12:30 <gchristensen> nice :)

12:30 <gchristensen> it'll be a bit annoying for grahamcofborg but that is probably finee

12:32 <orivej> I've queued the evalution of the nixpkgs/trunk jobset

12:34 <gchristensen> oh cool, I got in Dezgeg :)

12:35 <vcunat> gchristensen: have you assimilated some phones?

12:35 <vcunat> (or other ARMs)

12:35 <gchristensen> I have no

12:36 <gchristensen> not. but I'd be happy to add some to the ... cube? :)

12:39 <gchristensen> ok, what should we set nix.buildCores to ? keep in mind hydra is set to use 48 cores, it has 96

12:40 <vcunat> I would actually not go so high.

12:40 <vcunat> If we don't want to keep discovering bugs in everyone's makefiles.

12:41 <vcunat> I don't think it's common to build with -j 100 on ARMs

12:41 <gchristensen> sorry

12:41 <gchristensen> ok, what should we set nix.buildCores to ? keep in mind hydra is set to 48 maxJobs, it has 96 cores.

12:41 <orivej> without https://github.com/NixOS/nixpkgs/pull/31965 I'd start with cores = 0

12:42 <orivej> and only lower it if we run out of RAM

12:42 <gchristensen> oh. I was thinking I'd set it to like, 2, or 3 :P

12:42 <vcunat> 128 GiB RAM is there, right?

12:42 <vcunat> (that's what they have on web site)

12:42 <gchristensen> yeah

12:42 <vcunat> That should be OK for builds.

12:43 <gchristensen> but we definitely don't want unbounded parallelism I think

12:43 <vcunat> it might not be that bad

12:43 <gchristensen> how about we start with 3 and go from there?

12:43 <vcunat> that won't have any effect

12:43 <gchristensen> no?

12:43 <vcunat> it would pass -l 3 everywhere

12:44 <vcunat> See the PR orivej referenced.

12:44 <gchristensen> oh

12:44 <gchristensen> you sure? load average: 37.60, 38.06, 36.44

12:44 <vcunat> What's the --jobs setting?

12:45 <vcunat> Oh, 48 :-)

12:45 <vcunat> I can see nothing better than cores = 0 ATM.

12:46 <vcunat> And with that I think we can have smaller maxJobs.

12:46 <vcunat> e.g. 24

12:46 <orivej> gchristensen: load average = max-jobs because one make always runs at least one process, even if that exceeds the target load average

12:46 <vcunat> With so many jobs it will likely utilize all the cores anyway.

12:47 <vcunat> Doing less and faster feels better to me.

12:48 <vcunat> HDD space < RAM * 2 :-)

12:49 <aszlig> vcunat: hmm... i can't really reproduce the azerty test failure :-/

12:49 <gchristensen> 312G /

12:49 <vcunat> Oh, I assumed it's 250G, like on web.

12:49 <vcunat> well, my phone has less flash than RAM, so I suppose it's common for ARMs

12:49 <gchristensen> maybe I got a special one

12:50 <vcunat> aszlig: I can't reproduce it locally either

12:50 <gchristensen> ok here goes :)

12:50 <vcunat> maybe it's connected to high load

12:50 * gchristensen looks at build-cores set to 0

12:50 <gchristensen> _hem_

12:50 <aszlig> vcunat: certainly, doing some tests...

12:51 <gchristensen> ok let's try restarting a big build

12:51 <orivej> aszlig: could you push your revert to a branch in *your* fork of nixpkgs (for review)?

12:51 <vcunat> aszlig: Hydra succeeded on the seventh attempt. Lucky me!

12:52 <vcunat> aszlig: if you think my PR is unlikely to worsen anything, I'm willing to try it "blindly" and see if the error rate goes down on Hydra.

12:52 mbrgm_ has joined joined #nixos-dev

12:53 mbrgm has quit [(Ping timeout: 248 seconds)]

12:53 mbrgm_ is now known as mbrgm

12:53 <gchristensen> "waiting for exclusive access to the Nix store..." I get this a lot using nix interactively on the type-2a

13:18 <aszlig> orivej: https://github.com/aszlig/nixpkgs/tree/revertey

13:18 <adisbladis> vcunat: What ARMs would you like to have accumulated?

13:19 <aszlig> vcunat: sure

13:19 <aszlig> but i still think this is a race condition

13:19 <vcunat> I don't have any aarch64, so borg would be a nice way to test stuff in there as well.

13:19 <vcunat> aszlig: what kind of race?

13:20 <aszlig> vcunat: the terminal not getting the sendKeys()

13:20 <aszlig> so i think we should make that more robust instead

13:21 <vcunat> aszlig: so something else than not getting enough CPU for 60s

13:21 <vcunat> (and thus not getting them)

13:22 <vcunat> I wonder if there's a performance penalty when you virtualize multiple machines at once on a single CPU...

13:22 <vcunat> (or perhaps lack of some other resources)

13:24 <aszlig> vcunat: ah, sorry... that timeout is for reader.ready

13:24 <aszlig> vcunat: so that's not the kind of race condition i was having in mind

13:26 <aszlig> vcunat: so does it always happen with the xterm tests or the VT ones too?

13:30 <gchristensen> vcunat: I'm working on getting a type 2a for people in the community to help with the aarch64 effort, we can run a borg there

13:32 <vcunat> aszlig: I don't know

13:33 <vcunat> gchristensen: a separate one?

13:33 <vcunat> wouldn't it be better shared

13:33 <vcunat> if it's for the PRs, it would be best if the binaries from testing were then used for binary cache contents directly...

13:34 <aszlig> vcunat: okay, i've got an idea, one second...

13:40 <gchristensen> yes, a separate one

13:42 <gchristensen> I don't want to mix that infrastructure, and ideally would give out more liberal access to borg than committers. we don't want to be too liberal in having things access the hydra build machines

13:42 <gchristensen> since they're a fairly critical component of the chain of trust of the nix cache

13:42 <aszlig> vcunat: don't merge #32020 yet, please

13:43 <vcunat> OK. I wasn't going to.

13:44 <vcunat> gchristensen: but 2A seems really overkill just to test PRs

13:44 <gchristensen> it is also for people in the community to help with the aarch64 effort

13:45 <vcunat> even so, 96 cores... and 20 Gbit network

13:46 <vcunat> We don't need to worry until we get it, I guess.

13:46 <gchristensen> why worry? they want us to have the tools we need to get support

13:46 <vcunat> But I would expect there's some relatively secure way to "split" it into two machines.

13:47 <vcunat> That would be ideal, as parts unused by community would get utilized by Hydra directly.

13:47 <vcunat> You don't need to hand out root access, most likely, too.

13:48 <gchristensen> I'd rather not go down this complicated route, it doesn't seem like a prudent use of time

13:49 <vcunat> Right, probably not.

13:49 <vcunat> I mean, considering relative priorities.

13:50 <gchristensen> right

13:51 <gchristensen> I'm confused about that failure to build, I wish I had more log output.

13:51 <gchristensen> maybe something went over the 30min build timeout

13:51 <vcunat> there's a mass rebuild on master

13:52 <vcunat> gchristensen: what missing tools do you have in mind? Something I missed on https://github.com/NixOS/nixpkgs/issues/31606 ?

13:52 <vcunat> Or is this general support (not NixOS-focused) to be put into as separate thread?

13:53 <gchristensen> oh by tools I mean, access to a big ol' aarch64 box to play with and get stuff compiling

13:53 <vcunat> :-)

13:54 <vcunat> I completely misunderstood your sentence.

13:54 <vcunat> I thought they were waiting for us to provide better SW support before contributing another machine.

13:56 <gchristensen> I was talking to them about the security model of hydra and why we don't want to let contributors have general access to it

13:56 <gchristensen> they understood

14:02 <gchristensen> aszlig, vcunat both of you can't call the bot... would you like to?

14:03 <vcunat> I thought you've told me I can

14:03 <vcunat> I would like to, because of darwin

14:03 <gchristensen> you're not in the ACL :/

14:03 <vcunat> I'm right at 16 threads, so Linux builds aren't a problem, but I guess you can't instruct it per-platform yet.

14:04 <vcunat> (Maybe you just offered it, I don't remember.)

14:05 <gchristensen> vcunat, aszlig: https://github.com/grahamc/ofborg#guidelines

14:05 <vcunat> thanks

14:07 phreedom has quit [(Ping timeout: 248 seconds)]

14:08 <aszlig> vcunat: Can you make a jobset for this branch? https://github.com/aszlig/nixpkgs/tree/keymap-test-debug

14:09 <aszlig> it's essentially just https://github.com/aszlig/nixpkgs/commit/e231ed3f1b177d744ea356f8d60082e5eaced554 applied to master

14:09 <aszlig> that way we should get a screenshot whenever a timeout occurs

14:09 <aszlig> and the test build will always succeed with an output path

14:11 <aszlig> vcunat: so in order to debug this i'd suggest making a hydra jobset and i'll try to regularly push random whitespace changes to the keymap test to that branch so we have multiple evaluations

14:12 <vcunat> aszlig: I don't want to complicate this with that mass rebuild

14:12 <vcunat> I can point it to your branch, but better pick the commit atop 2f1a818d00f957f3102c0b412864c63b6e3e7447

14:12 <vcunat> (for example)

14:12 <vcunat> (the last finished Hydra evaluation)

14:13 <aszlig> vcunat: in order to keep eval-time low: https://gist.github.com/aszlig/3bf82d4d5c20d682e3c2e610cf526a65

14:13 <aszlig> okay, will rebase

14:14 <aszlig> done

14:14 phreedom has joined joined #nixos-dev

14:15 <aszlig> vcunat: for that gist it should be something like input "recipe" -> git checkout -> https://gist.github.com/3bf82d4d5c20d682e3c2e610cf526a65.git

14:15 <vcunat> let me try

14:16 <aszlig> vcunat: and of course for the main expression release.nix in recipe

14:16 <aszlig> that way only the keymap tests are evaluated

14:20 <vcunat> hydra is really flexible... but since around the mass-rebuild merge all evals I see are only pending, including this one https://hydra.nixos.org/jobset/nixos/keymap-test-debug

14:21 <aszlig> mhm...

14:21 <aszlig> maybe the evaluator has OOMed or something?

14:22 <aszlig> otoh... it's Restart=always

14:22 <vcunat> considering that Hydra gets tricked into building during evaluation...

14:23 <vcunat> (maybe that's why some evals can take over half an hour lately)

14:23 <aszlig> sure, but it should already use the parallel evaluator

14:23 <aszlig> hm, well...

14:23 <vcunat> yes, two evals at once IIRC

14:24 <aszlig> the parallel eval only does 4 at once

14:30 <vcunat> Well, right now it continues to build staging, at least, but by tomorrow it will probably be mostly idle (all but Darwin).

14:34 <gchristensen> the t2a is down to a load of 3. bummer that I turned up cores right at the end of a set of jobs

14:37 <vcunat> gchristensen: that was just temporary?

14:37 <vcunat> Hydra shows it as fully loaded by jobs.

14:38 <vcunat> (and five of those are kernel builds)

14:40 <gchristensen> no it wasn't temporary

14:42 <vcunat> Hydra reports no builds finished on aarch64 for the past several minutes

14:42 <vcunat> (I didn't look further back)

14:42 <vcunat> s/builds/build steps/

14:43 <gchristensen> hmm

14:43 <gchristensen> there are 4 cc1's running

14:45 <vcunat> gchristensen: you restarted it somehow?

14:45 <vcunat> (the nix daemon or something)

14:45 <gchristensen> yeah, nix-daemon restarted when I changed the jobs

14:45 <vcunat> Hydra's probably showing just ghost jobs, as almost all "are there" for over an hour.

14:46 <vcunat> Maybe it thinks that 2A is loaded even though it isn't.

14:46 <gchristensen> yeah those jobs may need to be canceled and restarted

14:47 <vcunat> gchristensen: how long ago was that approximately?

14:48 <gchristensen> 1h 55min ago

14:48 <vcunat> OK

14:48 <vcunat> I'm cancelling the ghosts, and it seems to start filling up with new jobs.

14:49 <vcunat> ah, no, I was looking at t2-4 jobs

14:55 <vcunat> gchristensen: even newly started gcc built is stuck at waiting for exclusive access to the Nix store...

14:55 <vcunat> for several minutes now

14:55 <gchristensen> hmm

14:55 <vcunat> just as all the other builds

14:56 <gchristensen> ok what is happening :)

14:56 <vcunat> Those that have been running for two hours have nothing else but this line in the log.

14:56 <LnL> I've seen that before

14:56 <vcunat> All of them.

14:56 <vcunat> A stale lock, possibly, but I haven't seen that with nix yet.

14:56 <vcunat> (But I don't have a 100-core either.)

14:57 <gchristensen> right

14:57 <gchristensen> how about I stop nix-daemon and kill the leftover nix-store processes

14:57 <LnL> yeah, I probably restarted the daemon

14:57 <LnL> and maybe killall -u nixbld1 etc.

14:57 <vcunat> yes, I don't think it can get worse

14:58 <vcunat> or reboot the whole machine

14:58 <gchristensen> ah ha

14:58 <gchristensen> lots of nix-store --serve --write

14:59 <LnL> that rings a bell

15:01 <gchristensen> there we go

15:02 <gchristensen> ok, it should be good to go now ...

15:03 <aszlig> gchristensen: have you looked into the journal?

15:03 <aszlig> gchristensen: because that might be because of builds during eval, they don't show up in the queue of the UI

15:03 <gchristensen> it was in a really bad state

15:04 <gchristensen> it should be good now, but nothing is building on it

15:04 <gchristensen> vcunat: can you cancel / restart some jobs?

15:04 <vcunat> I'm not sure if he has access to the evaluator

15:04 <aszlig> restarting the evaluator and/or the queue-runner should have stopped it as well

15:04 <vcunat> gchristensen: there are ~7k in the queue

15:04 <vcunat> for aarch64

15:05 <aszlig> ah, on the build slave

15:05 <gchristensen> yeah but are there jobs that think they're building?

15:05 <aszlig> might have been a good idea to strace one of those nix-store --serve --write processes

15:06 <vcunat> https://hydra.nixos.org/machines shows empty list

15:06 <aszlig> my guess would be that they hung because of network issues

15:06 <gchristensen> hmm I bet hydra just gave up on the aarch box for a bit

15:07 <vcunat> :'(

15:07 <vcunat> hopefully not for good

15:08 <gchristensen> iirc if it fails to connect too many times in a row it'll give up for a few minutes

15:08 <aszlig> nah, IIRC it has a timeout for retry

15:08 <gchristensen> two IIRCs, I'll take that as fact :P

15:09 <gchristensen> there it goes

15:09 <gchristensen> spooky looking at a 96 core machine at 0 load... fixed now :)

15:09 <aszlig> int delta = retryInterval * std::pow(retryBackoff, info->consecutiveFailures - 1) + (rand() % 30);

15:10 <aszlig> delta is the time in seconds until it retries

15:10 <aszlig> retryBackoff is 3 and retryInterval is 60

15:11 <gchristensen> pretty much all the cores are at 100% usage

15:12 <aszlig> ah, now it has builds running

15:12 <aszlig> oh

15:12 <aszlig> missed that: 16:09 < gchristensen> there it goes

15:12 <gchristensen> ah :)

15:12 <vcunat> great, so from now on we can expect roughly doubled throughput, compared to all time before :-D

15:12 <gchristensen> :$

15:12 <gchristensen> with unbounded parallelism we should expect unbounded throughput! :)

15:13 <aszlig> more or less...

15:14 <aszlig> i'm only having nproc 48 build nodes, but when building the world the main bottleneck are autoconf scripts

15:15 <aszlig> maybe it would make sense to run them with something like dash

15:15 <vcunat> niksnut: I hope you/someone can revive the evaluator. x86_64-linux has nothing to build already...

15:16 <gchristensen> I think it is him and ikwildrpepper wil access to the evaluator

15:16 <vcunat> yes

15:17 <vcunat> at least years ago it was so IIRC

15:17 <vcunat> aszlig: on Hydra we can utilize the cores even with --cores 1, so I don't think it's worthwhile

15:17 <vcunat> better spend time migrating those to meson or something ;-)

15:18 yorick has quit [(Ping timeout: 240 seconds)]

15:21 JosW has joined joined #nixos-dev

15:28 <aszlig> vcunat: hm, ran a small benchmark with gnu hello and it really doesn't make such a big difference:

15:28 <aszlig> dash: real 0m18.208s user 0m11.568s sys 0m6.139s

15:28 <aszlig> bash: real 0m18.257s user 0m11.667s sys 0m6.085s

15:55 <gchristensen> is there anything we can do on nodes with gobs of RAM to make builds touch disk less?

15:56 <aszlig> gchristensen: make /tmp a tmpfs maybe?

15:56 <Dezgeg> then you get to the fun part of sizing the tmpfs correctly x)

15:57 <aszlig> yeah :-D

15:57 <gchristensen> no kernel tunables that would be obvious?

15:57 <aszlig> ah, you mean more agressive caching and no fsync?

15:57 <Dezgeg> sure, dirty_ratio and dirty_background_ratio

15:59 <aszlig> or:

16:00 <aszlig> systemd.services.nix-daemon.environment.LD_PRELOAD = "${pkgs.libeatmydata}/lib/libeatmydata.so";

16:00 <gchristensen> I would 100% do that if this machine was easy to replace :P

16:01 <gchristensen> (read: netboot)

16:01 <aszlig> hm?

16:01 <aszlig> what exactly?

16:01 <gchristensen> libeatmydata

16:01 <Dezgeg> well the nix store isn't power-fail safe anyway so almost no harm x)

16:01 <Dezgeg> though with that even nix-store --repair won't work

16:02 <aszlig> gchristensen: why? the builds are going to be transferred off the build machine anyway, so no harm

16:02 <gchristensen> I need to be able to safely update the machine

16:03 <aszlig> hm...

16:03 <aszlig> run a second nix-daemon?

16:05 <aszlig> along with a separate store from /nix/store

16:05 <aszlig> nix 1.12 should make that quite easy

16:06 <aszlig> but even when not using nix 1.12, you can still use containers

16:06 <aszlig> ... which of course shouldn't share the store... hmmm...

16:06 <gchristensen> ehh once we get a bit better aarch64 support it'll be easy too make this netboot I think then we can just go libeatmydata

16:07 <aszlig> or that :-)

16:22 <clever> Dezgeg: i think that as long as the order of data writes, and flushes to db.sqlite are preserved, it is powerfail safe

16:22 <clever> but the defaults for ext4 dont preserve that

16:22 <clever> ive seen files truncate after an improper shutdown

16:22 <clever> zfs seems like it will obey those rules much much better

16:23 <clever> ive also noticed, /nix/var/nix/binary-cache-v3.sqlite has the sync mode disabled for speed, which can lead to corruption at improper shutdown

16:23 <clever> nix can safely regenerate it if deleted, but it doesnt actualy handle the corruption

16:23 <clever> so i had to spend a few hours debugging that, and then delete the db

16:26 <Dezgeg> well it's nix who is not calling fsync() on store paths

16:27 <clever> yeah, but i would still expect zfs to keep non-sync'ed writes that happen before a synced db.sqlite, to all be preserved in-order

16:27 <Dezgeg> why?

16:27 <clever> thats just the design feel i get from zfs, but i see your point, the fsync() would lag more because of other data...

16:27 <Dezgeg> that would be a massive performance killer

16:28 <clever> but what about when you close() a handle, doesnt that imply sync?

16:28 <Dezgeg> no

16:29 <clever> ah, the man page agrees with you

16:36 <clever> 2017-11-25 12:36:31 < DeHackEd> the file written normally will be in an unkonwn state, but ZFS does guarantee that the sequence of filesystem syscalls will have been performed in order

16:37 <clever> from #zfs

16:37 <clever> and a normal sync() would ensure everything goes to disk

16:37 <clever> so in theory, if nix did a sync() against the /nix/store filesystem, all storepaths would persist, and it would be safe to mark them as good in db.sqlite

16:54 <Dezgeg> well yes, that would work on any filesystem

17:08 <vcunat> Hydra is evaluating again!

17:18 <jtojnar> vcunat: is https://github.com/NixOS/nixpkgs/pull/31991 and https://github.com/NixOS/nixpkgs/pull/31891 okay to merge to staging?

17:26 FRidh has quit [(Quit: Konversation terminated!)]

17:38 JosW has quit [(Quit: Konversation terminated!)]

17:50 mbrgm_ has joined joined #nixos-dev

17:52 mbrgm has quit [(Ping timeout: 240 seconds)]

17:52 mbrgm_ is now known as mbrgm

17:54 vcunat has quit [(Quit: Leaving.)]

18:02 <simpson> Hm, with fetchgit, is there a way to ask for the --recursive flag to git? Looking at packaging libfirm using these instructions: https://pp.ipd.kit.edu/firm/Download.html

18:08 <clever> simpson: i think you want fetchSubmodules = true;

18:09 * simpson tries

18:29 Sonarpulse has joined joined #nixos-dev

19:27 <Sonarpulse> bgamari: https://sourceware.org/binutils/docs/binutils/Selecting-the-Target-System.html is btw a partial list of ways to specify all the target platforms in binutils

19:27 <Sonarpulse> starting to fix that https://github.com/obsidiansystems/nixpkgs/tree/binutils-multitarget

20:50 mbrgm has quit [(Ping timeout: 240 seconds)]

20:51 mbrgm has joined joined #nixos-dev

21:29 ma27 has joined joined #nixos-dev

21:45 <gchristensen> "Parallel mksquashfs: Using 96 processors"

21:49 zraexy has quit [(Ping timeout: 260 seconds)]

21:58 <ma27> gchristensen: do we actually receive sponsoring for these machines (because we're such an awesome distro :p) or is this completely paid by the NixOS Foundation?

21:59 <gchristensen> Packet.net provides one Type 2 for Hydra, one Type 2A for hydra, and soon one Type 2A to help the community have an aarch64 system to help improve our aarch64 support

21:59 <gchristensen> see also https://www.packet.net/bare-metal/

22:01 <ma27> gchristensen: awesome!

22:01 <gchristensen> Packet.net is extremely awesome all around :)

22:01 <gchristensen> I love that my personal server has bonded gigabit NICs, and their entire company is incredible

22:34 <clever> :O

22:34 <ma27> gchristensen: sounds awesome! I should really have a more deatailed look at them %)

22:35 <clever> gchristensen: are the x86 offerings anywhere near as amazing?

22:35 <gchristensen> they all are

22:35 <gchristensen> take a look at that URL ^

22:36 <clever> nice

22:37 <clever> and i think i heard something about a nixos image?

22:37 <gchristensen> they officially support nixos on some of the types, but I owe them an updated installer

23:43 zraexy has joined joined #nixos-dev

23:53 _ts_ has quit [(Ping timeout: 252 seconds)]