#nixos-dev on 2019-03-12

2019-02-05 16:18 ekleog changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html https://r13y.com | 18.09 release managers: vcunat and samueldr | https://logs.nix.samueldr.com/nixos-dev

00:13 asymmetric has joined #nixos-dev

01:10 orivej has quit [Ping timeout: 252 seconds]

01:21 asymmetric has quit [Ping timeout: 272 seconds]

01:35 drakonis_ has joined #nixos-dev

01:39 drakonis has quit [Ping timeout: 252 seconds]

02:07 drakonis1 has joined #nixos-dev

02:09 drakonis2 has joined #nixos-dev

02:10 drakonis_ has quit [Ping timeout: 252 seconds]

02:12 drakonis1 has quit [Ping timeout: 240 seconds]

02:35 jtojnar has quit [Quit: jtojnar]

04:37 pie__ has joined #nixos-dev

04:37 pie__ has quit [Remote host closed the connection]

04:38 pie__ has joined #nixos-dev

04:40 pie_ has quit [Ping timeout: 245 seconds]

06:03 alp has quit [Ping timeout: 250 seconds]

06:03 phreedom has joined #nixos-dev

06:04 phreedom_ has quit [Ping timeout: 256 seconds]

06:51 alp has joined #nixos-dev

07:06 drakonis has joined #nixos-dev

07:12 drakonis2 has quit [Ping timeout: 244 seconds]

07:46 drakonis_ has joined #nixos-dev

07:47 <teto> I am investigating a crash on nix master. Followed the hacking guide to install from a nix-shell yet `make installcheck` misses libsodium and upon launch I get `/nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6: version `GLIBC_2.27' not found`.

07:50 drakonis has quit [Ping timeout: 252 seconds]

08:36 johanot has joined #nixos-dev

08:40 drakonis_ has quit [Ping timeout: 246 seconds]

08:41 <domenkozar> https://github.com/NixOS/nixos-weekly/pull/85

08:41 <{^_^}> nixos-weekly#85 (by domenkozar, 1 week ago, open): Call for Content: 2019/05

08:41 <domenkozar> newsworthy news this week?

08:42 ixxie has joined #nixos-dev

08:42 drakonis_ has joined #nixos-dev

08:53 ixxie has quit [Ping timeout: 246 seconds]

08:55 <makefu> nginx got bought by f5 .... :(

10:08 jtojnar has joined #nixos-dev

10:17 jtojnar has quit [Remote host closed the connection]

10:17 <delroth> untested at this point, but... https://github.com/delroth/nixpkgs/commit/3e7463993b8d99ae15ea48da4d9dbbdbc2fac603 does that seem like a good idea to anyone?

10:17 <delroth> I'm tired of seeing .wrapped-* in my top output

10:30 <delroth> (.*-wrapped rather -- point still stands)

11:28 asymmetric has joined #nixos-dev

11:32 orivej has joined #nixos-dev

12:56 orivej has quit [Ping timeout: 245 seconds]

13:28 orivej has joined #nixos-dev

13:33 ma27 has quit [Quit: WeeChat 2.2]

13:38 johanot has quit [Quit: WeeChat 2.2]

13:40 ma27 has joined #nixos-dev

13:45 <samueldr> delroth: AFAIUI some software will rely on the location of the binary (the wrapped one), a reason for which I discredited moving binaries to another folder wholesale

13:46 <samueldr> when they'll do that silly stuff they'll be at the wrong location

13:46 <samueldr> but I might have overestimated the issue when thinking about it initially

13:47 <samueldr> when I thought about something similar, I was thinking to a folder sibling to $out/bin though

13:48 <delroth> yeah that's my main concern as well, I'm trying to figure out how much exactly is impacted

13:49 <delroth> there are quite a few places not using wrapProgram atm in nixpkgs because they do need to keep the basename

13:49 <delroth> I've seen the hadoop package is doing that for example

13:50 <delroth> not sure about what the right side of the tradeoff is -- one side is easy to measure since we can look at derivations that don't use wrapProgram because of basename issues, but finding derivations that would break with wrapProgram putting in a different directory is hard right now

13:51 <samueldr> a compromise may be to use a parameter?

13:52 <samueldr> wrapProgram --something-meaning-it-wont-clobber-the-name

13:52 <delroth> doesn't solve the "top" problem -- my gut feeling is that the default should be the opposite :)

13:53 <delroth> basenames are user visible in quite a few tools

13:53 <samueldr> yeah

13:54 <delroth> I'm running a rebuild of my laptop system right now with the commit linked above, I'll see how much breaks

13:54 <samueldr> a thing to verify anyway: does stashing them in a folder break multiple layers of wrapping?

14:05 ciil_ has joined #nixos-dev

14:09 FRidh has quit [*.net *.split]

14:09 kgz has quit [*.net *.split]

14:09 makefu has quit [*.net *.split]

14:09 ciil has quit [*.net *.split]

14:09 Profpatsch has quit [*.net *.split]

14:09 rsa has quit [*.net *.split]

14:09 delroth has quit [*.net *.split]

14:09 aminechikhaoui has quit [*.net *.split]

14:10 aminechikhaoui has joined #nixos-dev

14:11 aminechikhaoui5 has joined #nixos-dev

14:12 aminechikhaoui5 has quit [Client Quit]

14:12 aminechikhaoui has quit [Remote host closed the connection]

14:16 jtojnar has joined #nixos-dev

14:16 Profpatsch has joined #nixos-dev

14:17 kgz has joined #nixos-dev

14:17 makefu has joined #nixos-dev

14:18 delroth has joined #nixos-dev

16:30 yl has joined #nixos-dev

17:18 johanot has joined #nixos-dev

17:21 drakonis has joined #nixos-dev

18:17 ma27 has quit [Quit: WeeChat 2.4]

18:21 ma27 has joined #nixos-dev

18:27 orivej has quit [Ping timeout: 268 seconds]

19:02 asymmetric has quit [Quit: Leaving]

19:11 drakonis has quit [Quit: WeeChat 2.3]

19:15 drakonis has joined #nixos-dev

19:18 drakonis_ has quit [Ping timeout: 268 seconds]

20:51 johanot has quit [Quit: WeeChat 2.4]

21:19 drakonis_ has joined #nixos-dev

21:20 <manveru> well, most ruby executables would break

21:22 drakonis has quit [Ping timeout: 252 seconds]

21:30 <yorick> maybe there should be a new nix stable release where nix-shell works as root

21:47 <infinisil> nix-shell doesn't work as root??

21:48 <infinisil> Works for me

21:50 orivej has joined #nixos-dev

21:58 <infinisil> Huh, just looking at this failure in hydra: https://hydra.nixos.org/build/90472195/nixlog/2/tail

21:59 <infinisil> A hash mismatch? How come? It's not a fixed output derivation

22:00 <infinisil> And it's cached by hydra

22:01 drakonis has joined #nixos-dev

22:04 drakonis_ has quit [Ping timeout: 272 seconds]

22:08 drakonis has quit [Ping timeout: 252 seconds]

22:08 <samueldr> if I was a betting man, I'd bet it's the same issue still present in tests, where sometimes it looks like the VM is mildly corrupt

22:09 <samueldr> (only from an external look though, no idea if the gut feeling is right)

22:10 <samueldr> failure verifying the installed system: https://hydra.nixos.org/build/89851849/nixlog/124

22:11 <samueldr> similar: https://hydra.nixos.org/build/89514398/nixlog/8

22:12 <samueldr> haven't found one quickly, but sometimes grub just hangs or fails in weird ways

22:12 <samueldr> all in transient ways

22:13 <samueldr> the "debugging hatch" may not have been the underlying issue, but only somehow causing it to happen more often

22:13 <samueldr> (if it was, nothing was validated in term of numbers)

22:16 <ekleog> did we ever see some weird failure like these ones on !packet-epyc?

22:17 <ekleog> I get the feeling every time I'm seeing some weird transient failure it's on packet-epyc

22:17 <ekleog> sure, it's getting most of the jobs for being such a horse, but…

22:17 <samueldr> it was something I brought on when testing earlier this year, maybe it only looks like it since epyc always gets to many tasks

22:17 <ekleog> at this stage I'm starting to think it may make sense to consider hardware failuse

22:18 <ekleog> failure*

22:18 drakonis has joined #nixos-dev

22:18 <samueldr> memtest was ran, successfully, if it is, it's something else :/

22:18 <ekleog> an overclocked processor that sometimes fails a bit?

22:18 <samueldr> (or something memtest won't exhibit)

22:18 <samueldr> no idea if packet's machine would overclock

22:19 <ekleog> (even if not overclocked, a processor misprint could lead to the same kind of behavior I think)

22:19 <samueldr> yep

22:19 <samueldr> sounds legit at least

22:19 <ekleog> it's supposed to be caught at test time in the factory, but…

22:20 <samueldr> from the outside, looking at when it fails, never the machine looks overloaded or anything in grafana

22:20 <samueldr> (I have no more insight in this than what's known publicly here)

22:24 <ekleog> I I wonder if there's some software that basically tests all|most instructions and checks their result

22:25 <ekleog> could find some CPU stress testing stuff but it looks more designed for bench than for checking the processor

22:25 copumpkin has quit [Ping timeout: 245 seconds]

22:34 <ekleog> it sounds like what people do when overclocking is either using some intel software (packet-epyc-1 is arm, though, iirc) or just use something like prime65 and see whether an error occurs within the OS during the stress time

22:35 <delroth> manveru: hmm, why would ruby break?

22:35 <ekleog> oh looks like prime65 is actually checking the results https://www.mersenne.org/download/stress.txt

22:36 <ekleog> -> gchristensen, would it be possible to try to run prime65 on packet-epyc-1 for like 1-2 days, to check whether the CPU looks OK?

22:37 <infinisil> samueldr: I think I'll just have to stop trying to look into weird failures and just restart to make sure it's not transient

22:37 <infinisil> Considering how often this happens, I feel like it might really be worth doing every build twice to fix this

22:38 <infinisil> Because it saves all of us time and makes the channels update faster

22:38 <samueldr> infinisil: try to keep a mental tally on which machine it built, if you can

22:38 <samueldr> (or a written tally if you want)

22:38 <infinisil> Alright

22:39 <samueldr> though you're now primed and probably biased against the epyc machine :)

22:39 * samueldr wonders if there are non-qemu things that fails on the epyc machine

22:41 <infinisil> I'll keep a tally :)

22:42 <manveru> delroth: i didn't mean ruby itself, but a lot of binaries from ruby gems use things like `require_relative '../lib/foo'`

22:42 <samueldr> the git-annex one the other day was on ike, tried reproducing using a non-kvm qemu with a same-generation cpu and it didn't look like I could

22:42 <samueldr> manveru: it would be likely then that a sibling directory to $out/bin would work?

22:42 <samueldr> (but then, what else would break?)

22:43 <manveru> samueldr: like $out/bin-real or something?

22:43 <samueldr> or $out/bin-wrapped, yeah

22:43 <samueldr> (well, wrappee?)

22:43 <manveru> hehe

22:44 <manveru> well, it'd be an interesting experiment either way, but it should work then, yeah

23:05 orivej has quit [Ping timeout: 246 seconds]

23:12 lassulus has quit [Ping timeout: 250 seconds]

23:16 <infinisil> So I restarted this one because it seemed sketchy: https://hydra.nixos.org/build/90304238

23:16 <infinisil> But it failed again

23:16 <infinisil> Still a weird error, this time not on a packet machine (t4a is the name), machine# switch_root: can't execute '/nix/store/2iar0ajsjpy8kgfyh108fcrzg0r097zw-nixos-system-machine-19.03pre-git/init': Operation not permitted

23:17 <samueldr> can you test locally (I will too)

23:18 yl has quit [Ping timeout: 255 seconds]

23:18 * samueldr hoists a "?" in the above sentence

23:18 <infinisil> I'll do too

23:19 <samueldr> oh, it never successfully ran

23:19 <samueldr> (for 19.03)

23:21 <infinisil> Huh

23:22 <samueldr> and hasn's since january 15th https://hydra.nixos.org/job/nixos/trunk-combined/nixos.tests.bcachefs.x86_64-linux/all?page=3

23:22 <infinisil> That totally looks like a corruption error though

23:22 <infinisil> Or something weird

23:22 <samueldr> when init can't be ran (for any reason) the kernel segfaults

23:22 <samueldr> well, when init exits*

23:22 <samueldr> init exits since it fails to run init

23:23 <samueldr> (stage-1 init exits, since it fails to `exec` stage-2's init)

23:23 <infinisil> Ah

23:23 <samueldr> might be a similar issue than the one stopping from updating the kernel to the current LTS

23:23 <samueldr> (you can stop the test if it hasn't finished for you, reproducible error)

23:24 <samueldr> aszlig: might be relevant to the overlayfs issue

23:24 <samueldr> probably not

23:24 <infinisil> The first faily line seems to be "kbd_mode: KDSKBMODE: Inappropriate ioctl for device"

23:25 <infinisil> Probably not it though

23:26 <samueldr> that's something that's been in the boot logs for a while (according to google)

23:26 <samueldr> though not sure what it means exactly

23:27 lassulus has joined #nixos-dev

23:27 <infinisil> This is the only kbd_mode reference in nixpkgs/nixos: https://github.com/NixOS/nixpkgs/blob/65898a4ddb4d18a11d1c47af006844e7b1b62c88/nixos/modules/tasks/kbd.nix#L89

23:28 <samueldr> lol, I went directly to the kernel source, which in hindsight I was wrong

23:28 <samueldr> >> kbd_mode - report or set the keyboard mode

23:29 <samueldr> most probably because there is no keyboard connected to the VM?

23:29 <infinisil> Ah that might be it

23:30 <samueldr> see the %G in the following line in the log and in the printf

23:30 <samueldr> chiiruno isn't an IRC peep, right?

23:31 <infinisil> printf "\033%%${if isUnicode then "G" else "@"}" >> /dev/console

23:31 <samueldr> (bcachefs test maintainer)

23:31 <samueldr> yeah, I was pointing out to it since it shows it _is_ the right thing you found with kbd_mode

23:31 <infinisil> The right thing?

23:32 <infinisil> Anyways, totally getting sidetracked here

23:33 <samueldr> ah, the right invocation of kbd_mode, so no, it's not what's at fault

23:33 <samueldr> I'm thinking bcachefs is just borked?

23:33 <samueldr> or uh, in combination with the 9p-backed testing infra

23:34 <infinisil> How do you even debug this!

23:34 <samueldr> since 18.09 is fine, I'd first update bcachefs in 18.09 to the same rev than in unstable (basically backport) and see if it sticks, then uh, think harder

23:35 <samueldr> there was irrelevant(?) breakage that makes it harder to bisect

23:37 <infinisil> A neat trick I've started using for bisects is to use this: https://paste.infinisil.com/maqmH3Hfjw

23:37 <samueldr> the failures in hydra on unstable differ starting with 88316873, which has #54752 in

23:37 <{^_^}> https://github.com/NixOS/nixpkgs/pull/54752 (by eadwu, 6 weeks ago, merged): linux_testing_bcachefs,bcachefs-tools: 20190123

23:38 lassulus_ has joined #nixos-dev

23:38 <infinisil> I guess if there's an unrelated breakage during bisecting, you can always first try to fix that failure first

23:39 <samueldr> neat snippet infinisil++ for bisecting

23:39 <{^_^}> infinisil's karma got increased to 61

23:40 <infinisil> :)

23:41 lassulus has quit [Ping timeout: 272 seconds]

23:41 lassulus_ is now known as lassulus

23:41 <infinisil> Combine that with a cache lookup to immediately exit 0 for ones that are already built, and combine that with a flag to control whether you want to bisect over stdenvs that you have to rebuild (by trying to build -A stdenv), and you got a very neat bisection script

23:42 <infinisil> *And* combine that with **recursive** bisection when it has been determined that a version update causes the breakage!

23:42 <infinisil> (well, not recursive, because it would probably only be for 1 level)

23:43 <samueldr> I don't think we can easily "just" yank out the 9p parts out of the tests infra, right?

23:44 <infinisil> samueldr: What's 9p for?

23:44 <samueldr> mounting the host's /nix/store (or part of?) in the VM

23:44 <infinisil> Oh that makes sense yeah

23:45 <samueldr> it's used with overlayfs and it is causing grief with the current recent kernels

23:48 <infinisil> samueldr: You're saying overlayfs itself depends on 9p?

23:48 <samueldr> no, sorry I was a bit brief

23:49 <samueldr> overlayFS is used to merge multiple FS in a single one, and here overlayFS is used to merge a writable store on top of the shared system's store, so it uses the mounted 9p overlayfs

23:49 <samueldr> AGH, sorry, "it uses the mounted 9p fs"

23:50 <infinisil> Ahh I see, more or less

23:50 <samueldr> and *something* changed into the mounting bits or overlayfs bits in the kernel somewhen between the LTSes and is why we're not upgrading to the next LTS yet

23:50 <samueldr> (our tests don't run)

23:53 drakonis1 has joined #nixos-dev