gchristensen changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | 18.09 release managers: vcunat and samueldr | https://logs.nix.samueldr.com/nixos-dev
<samueldr> hmm, doubling evaluator_max_heap_size to 22000000000 still gives Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS :/
<samueldr> let's double it again
init_6 has joined #nixos-dev
jtojnar has quit [Read error: Connection reset by peer]
<gchristensen> should we ditch the nixpkgs-unstable channel, and just have nixos-unstable (can still update nixpkgs-unstable, but have it always equal nixos-unstable)
<samueldr> I wouldn't care, but I'm not one using it
<samueldr> are there recent cases where nixpkgs-unstable really lagged behind?
<samueldr> btw, tried a couple other values for evaluator_max_heap_size, at 8x its initial value (88000000000) it still doesn't evaluate successfully
<gchristensen> don't know
<LnL> so nixos would block for darwin?
<gchristensen> yeah, LnL
<samueldr> sure doesn't help that I'm not confident with how it's used
<LnL> I wouldn't complain about having people pay better attention to stuff there, but I'm not sure it's the best idea
<gchristensen> aye
<gchristensen> I don't like that people describe a channel called "unstable"
<gchristensen> is why I ask :)
<LnL> samueldr: if I understand it correctly it will also try to allocate blocks that are too large if you use a high value
<LnL> so you kind of have to find a good balance
<LnL> but I could be totally wrong about that
<samueldr> no idea, but the values used are the ones configured in nixos' hydra, and then I gradually increased the max heap size
<samueldr> I have 64GiB ram to play with, 56GiB available ram here :/
<gchristensen> how much ram do you have?...oh
<samueldr> enough!
<gchristensen> there are other boehm environment variables you can play with
<samueldr> considering it uses (according to ram) 8.16GiB
<samueldr> yeah, but is it something that should be done blindly, while this is something that would be required to be replicated "in prod"?
<samueldr> I'd much rather let someone that has a better grasp of the situation look at it :/
<LnL> try GC_INITIAL_HEAP_SIZE=8G
<gchristensen> oh yeah, thats a good one
<samueldr> LnL: it's by using the hydra evaluator
<samueldr> ~/Projects/nixos/hydra/result/bin/hydra-eval-jobs --option evaluator_initial_heap_size 10000000000 --option eva
<samueldr> luator_max_heap_size 88000000000
<gchristensen> hydra-eval-jobs takes those as options?
<samueldr> I hope the initial heap size there is what's used
<samueldr> according to clever it might
<gchristensen> let's find out
<samueldr> at least, I saw different results without
<samueldr> (I haven't verified yet, but now think I should)
<LnL> yeah looks like it
<gchristensen> auto initialHeapSize = config->getStrOption("evaluator_initial_heap_size", "");
<gchristensen> setenv("GC_INITIAL_HEAP_SIZE", initialHeapSize.c_str(), 1);
<samueldr> this is what clever linked to earlier for the configuration
<LnL> I'm not sure what the value is you used tho, it understand suffixes
<samueldr> same as there
<samueldr> after all, the initial goal was to replicate hydra's eval failure :)
<gchristensen> ehh yeah it doesn't use those options
<samueldr> and I can say that it does replicate hydra's failures!
<samueldr> ?
<gchristensen> those options come from the config file
<LnL> I thought nixos.org used the environment variable in the unit file
<LnL> guess it changed a while back
<gchristensen> according to this, heh, the env var is overridden if the argument is passed
<gchristensen> erm, no, not argument
<gchristensen> if the config file contains the option
hedning has quit [Quit: hedning]
<samueldr> welp ! the defaults for hydra-eval-jobs are also exhibiting the same issue it seems :/
<gchristensen> what happens if you set GC_INITIAL_HEAP_SIZE=10G
<samueldr> let's see in maybe 40minutes?
<gchristensen> or go wild
<gchristensen> GC_INITIAL_HEAP_SECTIONS=20G
<samueldr> yeah, hydra-config.hh, the constructor, only checks for whatever is in the file HYDRA_CONFIG points to, good to know
<samueldr> evaluator_max_heap_size defaults to 1UL << 30, which is 1GiB; evaluator_initial_heap_size defaults to nothing
<samueldr> (and thus would default to whatever the environment sets it at)
<samueldr> evaluator_max_heap_size is not the max heap size, but its initial max heap size?
<gchristensen> the evaluator tries again and again
<samueldr> yes, that I know, but what confuses me is how it seemingly increases maxHeapSize there
<samueldr> 64MiB at a time
<gchristensen> yeah, not sure
<samueldr> and what confuses me more is how the following line prints the same value in succession for failures https://github.com/NixOS/hydra/blob/423c0440eaee8b66706ca1f00f90e2ece41b36b1/src/hydra-eval-jobs/hydra-eval-jobs.cc#L164
Lisanna has quit [Ping timeout: 246 seconds]
<gchristensen> I'd bet you've found bugs
<samueldr> plausible, annoying though that other than "it looks funny" I don't really know how to report that
<samueldr> or even if my interpretation is right
<gchristensen> maybe git log will shed some light
<gchristensen> not a lot
<samueldr> I'm checking what static means in `static size_t maxHeaps` just to be sure
pie__ has quit [Remote host closed the connection]
pie___ has joined #nixos-dev
init_6 has quit []
worldofpeace has joined #nixos-dev
jtojnar has joined #nixos-dev
init_6 has joined #nixos-dev
<samueldr> eek, with GC_INITIAL_HEAP_SIZE=10G it got to 17GiB of use (according to time) but still failed with Too many heap sections (but I did miss your GC_INITIAL_HEAP_SECTIONS) recommendation when I started it
<gchristensen> oh I meant SIZE... I forget what SECTIONS does
<samueldr> it might not be one?
<gchristensen> maybe :)
<gchristensen> I wonder about GC_USE_ENTIRE_HEAP and GC_DONT_GC and
<gchristensen> s/and$//
<samueldr> >> restarting hydra-eval-jobs after job 'nixos.tests.containers-ipv6.aarch64-linux' because heap size is at 42949672960 bytes
* samueldr wonders
<samueldr> doesn't feel (but what do I know?) right how the heap size is at 40GiB
<samueldr> oh
<samueldr> OH
<samueldr> sure, if maxHeap isn't set, it will default to 1GiB, but I set the initial max heap to more than 1GiB !
<gchristensen> hmm
<samueldr> so the other time I probably was seeing a number which was smaller than maxHeap, which maxHeap tried 64MiB each time to get to
<samueldr> uh, bigger than maxHeap*
<samueldr> so yeah, understandably, it would have issues when checking whether it's over its limit, if the limit is lower than the initial value :/
<gchristensen> ...huh
<samueldr> (and to be clear, this is my mistake here)
<gchristensen> ah
<samueldr> maxHeap, the hydra concept, defaults to 1GiB, but I set the initial heap size to 40!
lopsided98 has quit [Quit: Disconnected]
lopsided98 has joined #nixos-dev
delroth has quit [Quit: WeeChat 2.3]
delroth has joined #nixos-dev
lopsided98 has quit [Client Quit]
lopsided98 has joined #nixos-dev
<simpson> 40GiB? Or 40B?
<simpson> Ah, I see now.
init_6 has quit [Ping timeout: 268 seconds]
<samueldr> yeah, 40GiB is bigger than 1GiB, and bigger than 1GiB+64MiB :)
orivej has quit [Ping timeout: 246 seconds]
jtojnar has quit [Quit: jtojnar]
fadenb has quit [Ping timeout: 268 seconds]
lassulus_ has joined #nixos-dev
lassulus has quit [Ping timeout: 250 seconds]
lassulus_ is now known as lassulus
worldofpeace has quit [Remote host closed the connection]
drakonis1 has quit [Remote host closed the connection]
init_6 has joined #nixos-dev
phreedom_ is now known as phreedom
fadenb has joined #nixos-dev
fadenb has quit [Client Quit]
fadenb has joined #nixos-dev
fadenb has quit [Client Quit]
pie___ has quit [Remote host closed the connection]
pie___ has joined #nixos-dev
fadenb has joined #nixos-dev
orivej has joined #nixos-dev
<Profpatsch> ekleog: Also yay for using generators.toINI \o/
jtojnar has joined #nixos-dev
<Profpatsch> ekleog: Though the r2e CLI is pretty obnoxious :(
<Profpatsch> And the manpage doesn’t really describ any of the fields that the config supports?
<ekleog> Profpatsch: AFAICT you can just set to= under each feed to set the address to send to
jtojnar has quit [Ping timeout: 246 seconds]
jtojnar has joined #nixos-dev
aristid1 has joined #nixos-dev
aristid1 is now known as aristid
pie__ has joined #nixos-dev
pie___ has quit [Remote host closed the connection]
init_6 has quit [Ping timeout: 250 seconds]
phreedom has quit [Ping timeout: 256 seconds]
hedning has joined #nixos-dev
phreedom has joined #nixos-dev
hedning has quit [Quit: hedning]
drakonis has joined #nixos-dev
drakonis_ has joined #nixos-dev
drakonis has quit [Ping timeout: 252 seconds]
drakonis has joined #nixos-dev
drakonis_ has quit [Ping timeout: 252 seconds]
pie___ has joined #nixos-dev
pie__ has quit [Remote host closed the connection]
clever has quit [Ping timeout: 246 seconds]
drakonis_ has joined #nixos-dev
drakonis has quit [Ping timeout: 252 seconds]
orivej has quit [Ping timeout: 272 seconds]
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 240 seconds]
orivej has joined #nixos-dev
worldofpeace has joined #nixos-dev
orivej has quit [Ping timeout: 272 seconds]
jtojnar has quit [Read error: Connection reset by peer]
jtojnar has joined #nixos-dev
jtojnar has quit [Quit: jtojnar]
jtojnar has joined #nixos-dev
jtojnar has quit [Read error: No route to host]
jtojnar has joined #nixos-dev
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 250 seconds]
orivej has joined #nixos-dev
jtojnar has quit [Read error: Connection reset by peer]
jtojnar has joined #nixos-dev
<timokau[m]> A not-reproducible segfault, tracked in #52709
<{^_^}> https://github.com/NixOS/nixpkgs/issues/52709 (by timokau, 5 minutes ago, open): suitesparse build failed on hydra (transiently)
<samueldr> timokau[m]: done
<timokau[m]> Thanks!
<timokau[m]> IIRC the restart won't propagate automatically right? If that's the case, please also restart sage
drakonis has joined #nixos-dev
Drakonis__ has joined #nixos-dev
drakonis1 has joined #nixos-dev
drakonis_ has quit [Ping timeout: 240 seconds]
drakonis has quit [Ping timeout: 252 seconds]
Drakonis__ has quit [Ping timeout: 252 seconds]
phreedom has quit [Remote host closed the connection]
drakonis1 has quit [Read error: Connection reset by peer]
phreedom has joined #nixos-dev
<samueldr> I think I restarted the right sage build
<timokau[m]> It apparently did restart, but immediately failed again. Not sure if that's hydra goofyness because suitesparse wasn't done yet or a real issue.
<timokau[m]> Anyway, problem for another day. Getting late
<timokau[m]> Thanks again!
<samueldr> it had another dependency failing
<samueldr> restarted *that* build too
<timokau[m]> So much for reproducibility :D