#nixos-dev on 2018-12-22

2018-08-16 20:49 gchristensen changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | 18.09 release managers: vcunat and samueldr | https://logs.nix.samueldr.com/nixos-dev

00:01 <samueldr> hmm, doubling evaluator_max_heap_size to 22000000000 still gives Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS :/

00:02 <samueldr> let's double it again

00:56 init_6 has joined #nixos-dev

01:01 jtojnar has quit [Read error: Connection reset by peer]

01:18 <gchristensen> should we ditch the nixpkgs-unstable channel, and just have nixos-unstable (can still update nixpkgs-unstable, but have it always equal nixos-unstable)

01:23 <samueldr> I wouldn't care, but I'm not one using it

01:23 <samueldr> are there recent cases where nixpkgs-unstable really lagged behind?

01:24 <samueldr> btw, tried a couple other values for evaluator_max_heap_size, at 8x its initial value (88000000000) it still doesn't evaluate successfully

01:24 <gchristensen> don't know

01:24 <LnL> so nixos would block for darwin?

01:24 <gchristensen> yeah, LnL

01:24 <samueldr> sure doesn't help that I'm not confident with how it's used

01:25 <LnL> I wouldn't complain about having people pay better attention to stuff there, but I'm not sure it's the best idea

01:26 <gchristensen> aye

01:26 <gchristensen> I don't like that people describe a channel called "unstable"

01:26 <gchristensen> is why I ask :)

01:28 <LnL> samueldr: if I understand it correctly it will also try to allocate blocks that are too large if you use a high value

01:28 <LnL> so you kind of have to find a good balance

01:29 <LnL> but I could be totally wrong about that

01:30 <samueldr> no idea, but the values used are the ones configured in nixos' hydra, and then I gradually increased the max heap size

01:31 <samueldr> I have 64GiB ram to play with, 56GiB available ram here :/

01:31 <gchristensen> how much ram do you have?...oh

01:31 <samueldr> enough!

01:31 <gchristensen> there are other boehm environment variables you can play with

01:31 <samueldr> considering it uses (according to ram) 8.16GiB

01:32 <samueldr> yeah, but is it something that should be done blindly, while this is something that would be required to be replicated "in prod"?

01:32 <samueldr> I'd much rather let someone that has a better grasp of the situation look at it :/

01:32 <LnL> try GC_INITIAL_HEAP_SIZE=8G

01:33 <gchristensen> oh yeah, thats a good one

01:33 <samueldr> LnL: it's by using the hydra evaluator

01:33 <samueldr> ~/Projects/nixos/hydra/result/bin/hydra-eval-jobs --option evaluator_initial_heap_size 10000000000 --option eva

01:33 <samueldr> luator_max_heap_size 88000000000

01:33 <gchristensen> hydra-eval-jobs takes those as options?

01:33 <samueldr> I hope the initial heap size there is what's used

01:33 <samueldr> according to clever it might

01:33 <gchristensen> let's find out

01:33 <samueldr> at least, I saw different results without

01:33 <samueldr> (I haven't verified yet, but now think I should)

01:34 <LnL> yeah looks like it

01:34 <gchristensen> auto initialHeapSize = config->getStrOption("evaluator_initial_heap_size", "");

01:34 <gchristensen> setenv("GC_INITIAL_HEAP_SIZE", initialHeapSize.c_str(), 1);

01:34 <samueldr> https://github.com/NixOS/nixos-org-configurations/blob/master/delft/hydra.nix#L64

01:34 <samueldr> this is what clever linked to earlier for the configuration

01:34 <LnL> I'm not sure what the value is you used tho, it understand suffixes

01:34 <samueldr> same as there

01:35 <samueldr> after all, the initial goal was to replicate hydra's eval failure :)

01:35 <gchristensen> ehh yeah it doesn't use those options

01:35 <samueldr> and I can say that it does replicate hydra's failures!

01:35 <samueldr> ?

01:35 <gchristensen> those options come from the config file

01:37 <LnL> I thought nixos.org used the environment variable in the unit file

01:38 <LnL> https://github.com/NixOS/nixos-org-configurations/commit/741bfa39b209d8c0b32e5662297895ede0457a68

01:39 <LnL> guess it changed a while back

01:39 <gchristensen> according to this, heh, the env var is overridden if the argument is passed

01:40 <gchristensen> erm, no, not argument

01:40 <gchristensen> if the config file contains the option

01:40 hedning has quit [Quit: hedning]

01:43 <samueldr> welp ! the defaults for hydra-eval-jobs are also exhibiting the same issue it seems :/

01:43 <gchristensen> what happens if you set GC_INITIAL_HEAP_SIZE=10G

01:44 <samueldr> let's see in maybe 40minutes?

01:44 <gchristensen> or go wild

01:44 <gchristensen> GC_INITIAL_HEAP_SECTIONS=20G

01:46 <samueldr> yeah, hydra-config.hh, the constructor, only checks for whatever is in the file HYDRA_CONFIG points to, good to know

01:48 <samueldr> evaluator_max_heap_size defaults to 1UL << 30, which is 1GiB; evaluator_initial_heap_size defaults to nothing

01:48 <samueldr> (and thus would default to whatever the environment sets it at)

01:51 <samueldr> I'm a bit confused here, https://github.com/NixOS/hydra/blob/master/src/hydra-eval-jobs/hydra-eval-jobs.cc#L323

01:51 <samueldr> evaluator_max_heap_size is not the max heap size, but its initial max heap size?

01:54 <gchristensen> the evaluator tries again and again

01:58 <samueldr> yes, that I know, but what confuses me is how it seemingly increases maxHeapSize there

01:58 <samueldr> 64MiB at a time

01:58 <gchristensen> yeah, not sure

01:59 <samueldr> and what confuses me more is how the following line prints the same value in succession for failures https://github.com/NixOS/hydra/blob/423c0440eaee8b66706ca1f00f90e2ece41b36b1/src/hydra-eval-jobs/hydra-eval-jobs.cc#L164

01:59 Lisanna has quit [Ping timeout: 246 seconds]

01:59 <gchristensen> I'd bet you've found bugs

02:00 <samueldr> plausible, annoying though that other than "it looks funny" I don't really know how to report that

02:00 <samueldr> or even if my interpretation is right

02:00 <gchristensen> maybe git log will shed some light

02:01 <gchristensen> not a lot

02:02 <samueldr> I'm checking what static means in `static size_t maxHeaps` just to be sure

02:12 pie__ has quit [Remote host closed the connection]

02:12 pie___ has joined #nixos-dev

02:17 init_6 has quit []

02:21 worldofpeace has joined #nixos-dev

02:32 jtojnar has joined #nixos-dev

02:34 init_6 has joined #nixos-dev

02:35 <samueldr> eek, with GC_INITIAL_HEAP_SIZE=10G it got to 17GiB of use (according to time) but still failed with Too many heap sections (but I did miss your GC_INITIAL_HEAP_SECTIONS) recommendation when I started it

02:35 <gchristensen> oh I meant SIZE... I forget what SECTIONS does

02:35 <samueldr> it might not be one?

02:35 <gchristensen> maybe :)

02:37 <samueldr> https://github.com/ivmai/bdwgc/blob/release-7_6/doc/README.environment

02:40 <gchristensen> I wonder about GC_USE_ENTIRE_HEAP and GC_DONT_GC and

02:40 <gchristensen> s/and$//

02:45 <samueldr> >> restarting hydra-eval-jobs after job 'nixos.tests.containers-ipv6.aarch64-linux' because heap size is at 42949672960 bytes

02:45 * samueldr wonders

02:46 <samueldr> doesn't feel (but what do I know?) right how the heap size is at 40GiB

02:46 <samueldr> oh

02:46 <samueldr> OH

02:46 <samueldr> sure, if maxHeap isn't set, it will default to 1GiB, but I set the initial max heap to more than 1GiB !

02:47 <gchristensen> hmm

02:48 <samueldr> so the other time I probably was seeing a number which was smaller than maxHeap, which maxHeap tried 64MiB each time to get to

02:48 <samueldr> uh, bigger than maxHeap*

02:49 <samueldr> so yeah, understandably, it would have issues when checking whether it's over its limit, if the limit is lower than the initial value :/

02:50 <gchristensen> ...huh

02:50 <samueldr> (and to be clear, this is my mistake here)

02:50 <gchristensen> ah

02:50 <samueldr> maxHeap, the hydra concept, defaults to 1GiB, but I set the initial heap size to 40!

02:50 lopsided98 has quit [Quit: Disconnected]

02:52 lopsided98 has joined #nixos-dev

02:52 delroth has quit [Quit: WeeChat 2.3]

02:54 delroth has joined #nixos-dev

02:54 lopsided98 has quit [Client Quit]

02:56 lopsided98 has joined #nixos-dev

02:56 <simpson> 40GiB? Or 40B?

02:56 <simpson> Ah, I see now.

03:01 init_6 has quit [Ping timeout: 268 seconds]

03:19 <samueldr> yeah, 40GiB is bigger than 1GiB, and bigger than 1GiB+64MiB :)

03:32 orivej has quit [Ping timeout: 246 seconds]

03:45 jtojnar has quit [Quit: jtojnar]

03:49 fadenb has quit [Ping timeout: 268 seconds]

04:25 lassulus_ has joined #nixos-dev

04:28 lassulus has quit [Ping timeout: 250 seconds]

04:28 lassulus_ is now known as lassulus

04:33 worldofpeace has quit [Remote host closed the connection]

06:01 drakonis1 has quit [Remote host closed the connection]

06:22 init_6 has joined #nixos-dev

07:46 phreedom_ is now known as phreedom

09:43 fadenb has joined #nixos-dev

09:43 fadenb has quit [Client Quit]

10:01 fadenb has joined #nixos-dev

10:01 fadenb has quit [Client Quit]

10:15 pie___ has quit [Remote host closed the connection]

10:15 pie___ has joined #nixos-dev

10:18 fadenb has joined #nixos-dev

10:24 orivej has joined #nixos-dev

11:04 <Profpatsch> ekleog: Also yay for using generators.toINI \o/

11:20 jtojnar has joined #nixos-dev

11:38 <Profpatsch> ekleog: Though the r2e CLI is pretty obnoxious :(

11:41 <Profpatsch> And the manpage doesn’t really describ any of the fields that the config supports?

11:52 <ekleog> Profpatsch: AFAICT you can just set to= under each feed to set the address to send to

11:53 <ekleog> (hence https://github.com/NixOS/nixpkgs/pull/49228/files#diff-cb4d7163e3003f1c0e4cb0af78c3a955R61 )

13:00 jtojnar has quit [Ping timeout: 246 seconds]

13:06 jtojnar has joined #nixos-dev

13:20 aristid1 has joined #nixos-dev

13:20 aristid1 is now known as aristid

14:57 pie__ has joined #nixos-dev

14:58 pie___ has quit [Remote host closed the connection]

15:46 init_6 has quit [Ping timeout: 250 seconds]

16:02 phreedom has quit [Ping timeout: 256 seconds]

16:16 hedning has joined #nixos-dev

16:18 phreedom has joined #nixos-dev

16:28 hedning has quit [Quit: hedning]

16:45 drakonis has joined #nixos-dev

16:55 drakonis_ has joined #nixos-dev

16:59 drakonis has quit [Ping timeout: 252 seconds]

17:00 drakonis has joined #nixos-dev

17:02 drakonis_ has quit [Ping timeout: 252 seconds]

17:03 pie___ has joined #nixos-dev

17:04 pie__ has quit [Remote host closed the connection]

17:04 clever has quit [Ping timeout: 246 seconds]

17:08 drakonis_ has joined #nixos-dev

17:11 drakonis has quit [Ping timeout: 252 seconds]

17:31 orivej has quit [Ping timeout: 272 seconds]

17:39 orivej has joined #nixos-dev

17:43 orivej has quit [Ping timeout: 240 seconds]

17:50 orivej has joined #nixos-dev

18:36 worldofpeace has joined #nixos-dev

18:40 orivej has quit [Ping timeout: 272 seconds]

19:13 jtojnar has quit [Read error: Connection reset by peer]

19:13 jtojnar has joined #nixos-dev

19:30 jtojnar has quit [Quit: jtojnar]

19:30 jtojnar has joined #nixos-dev

20:19 jtojnar has quit [Read error: No route to host]

20:20 jtojnar has joined #nixos-dev

21:34 orivej has joined #nixos-dev

21:39 orivej has quit [Ping timeout: 250 seconds]

22:35 orivej has joined #nixos-dev

22:43 jtojnar has quit [Read error: Connection reset by peer]

22:53 jtojnar has joined #nixos-dev

23:12 <timokau[m]> Can somebody restart suitesparce (https://hydra.nixos.org/job/nixos/trunk-combined/nixpkgs.suitesparse.x86_64-linux)?

23:13 <timokau[m]> A not-reproducible segfault, tracked in #52709

23:13 <{^_^}> https://github.com/NixOS/nixpkgs/issues/52709 (by timokau, 5 minutes ago, open): suitesparse build failed on hydra (transiently)

23:13 <samueldr> timokau[m]: done

23:13 <timokau[m]> Thanks!

23:31 <timokau[m]> IIRC the restart won't propagate automatically right? If that's the case, please also restart sage

23:34 drakonis has joined #nixos-dev

23:34 Drakonis__ has joined #nixos-dev

23:35 drakonis1 has joined #nixos-dev

23:38 drakonis_ has quit [Ping timeout: 240 seconds]

23:38 drakonis has quit [Ping timeout: 252 seconds]

23:39 Drakonis__ has quit [Ping timeout: 252 seconds]

23:40 phreedom has quit [Remote host closed the connection]

23:40 drakonis1 has quit [Read error: Connection reset by peer]

23:41 phreedom has joined #nixos-dev

23:42 <samueldr> I think I restarted the right sage build

23:45 <timokau[m]> It apparently did restart, but immediately failed again. Not sure if that's hydra goofyness because suitesparse wasn't done yet or a real issue.

23:45 <timokau[m]> This one btw https://hydra.nixos.org/job/nixos/trunk-combined/nixpkgs.sage.x86_64-linux

23:45 <timokau[m]> Anyway, problem for another day. Getting late

23:45 <timokau[m]> Thanks again!

23:50 <samueldr> it had another dependency failing

23:50 <samueldr> restarted *that* build too

23:51 <timokau[m]> So much for reproducibility :D