ekleog changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html https://r13y.com | 18.09 release managers: vcunat and samueldr | https://logs.nix.samueldr.com/nixos-dev
asymmetric has joined #nixos-dev
orivej has quit [Ping timeout: 252 seconds]
asymmetric has quit [Ping timeout: 272 seconds]
drakonis_ has joined #nixos-dev
drakonis has quit [Ping timeout: 252 seconds]
drakonis1 has joined #nixos-dev
drakonis2 has joined #nixos-dev
drakonis_ has quit [Ping timeout: 252 seconds]
drakonis1 has quit [Ping timeout: 240 seconds]
jtojnar has quit [Quit: jtojnar]
pie__ has joined #nixos-dev
pie__ has quit [Remote host closed the connection]
pie__ has joined #nixos-dev
pie_ has quit [Ping timeout: 245 seconds]
alp has quit [Ping timeout: 250 seconds]
phreedom has joined #nixos-dev
phreedom_ has quit [Ping timeout: 256 seconds]
alp has joined #nixos-dev
drakonis has joined #nixos-dev
drakonis2 has quit [Ping timeout: 244 seconds]
drakonis_ has joined #nixos-dev
<teto> I am investigating a crash on nix master. Followed the hacking guide to install from a nix-shell yet `make installcheck` misses libsodium and upon launch I get `/nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6: version `GLIBC_2.27' not found`.
drakonis has quit [Ping timeout: 252 seconds]
johanot has joined #nixos-dev
drakonis_ has quit [Ping timeout: 246 seconds]
<{^_^}> nixos-weekly#85 (by domenkozar, 1 week ago, open): Call for Content: 2019/05
<domenkozar> newsworthy news this week?
ixxie has joined #nixos-dev
drakonis_ has joined #nixos-dev
ixxie has quit [Ping timeout: 246 seconds]
<makefu> nginx got bought by f5 .... :(
jtojnar has joined #nixos-dev
jtojnar has quit [Remote host closed the connection]
<delroth> untested at this point, but... https://github.com/delroth/nixpkgs/commit/3e7463993b8d99ae15ea48da4d9dbbdbc2fac603 does that seem like a good idea to anyone?
<delroth> I'm tired of seeing .wrapped-* in my top output
<delroth> (.*-wrapped rather -- point still stands)
asymmetric has joined #nixos-dev
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #nixos-dev
ma27 has quit [Quit: WeeChat 2.2]
johanot has quit [Quit: WeeChat 2.2]
ma27 has joined #nixos-dev
<samueldr> delroth: AFAIUI some software will rely on the location of the binary (the wrapped one), a reason for which I discredited moving binaries to another folder wholesale
<samueldr> when they'll do that silly stuff they'll be at the wrong location
<samueldr> but I might have overestimated the issue when thinking about it initially
<samueldr> when I thought about something similar, I was thinking to a folder sibling to $out/bin though
<delroth> yeah that's my main concern as well, I'm trying to figure out how much exactly is impacted
<delroth> there are quite a few places not using wrapProgram atm in nixpkgs because they do need to keep the basename
<delroth> I've seen the hadoop package is doing that for example
<delroth> not sure about what the right side of the tradeoff is -- one side is easy to measure since we can look at derivations that don't use wrapProgram because of basename issues, but finding derivations that would break with wrapProgram putting in a different directory is hard right now
<samueldr> a compromise may be to use a parameter?
<samueldr> wrapProgram --something-meaning-it-wont-clobber-the-name
<delroth> doesn't solve the "top" problem -- my gut feeling is that the default should be the opposite :)
<delroth> basenames are user visible in quite a few tools
<samueldr> yeah
<delroth> I'm running a rebuild of my laptop system right now with the commit linked above, I'll see how much breaks
<samueldr> a thing to verify anyway: does stashing them in a folder break multiple layers of wrapping?
ciil_ has joined #nixos-dev
FRidh has quit [*.net *.split]
kgz has quit [*.net *.split]
makefu has quit [*.net *.split]
ciil has quit [*.net *.split]
Profpatsch has quit [*.net *.split]
rsa has quit [*.net *.split]
delroth has quit [*.net *.split]
aminechikhaoui has quit [*.net *.split]
aminechikhaoui has joined #nixos-dev
aminechikhaoui5 has joined #nixos-dev
aminechikhaoui5 has quit [Client Quit]
aminechikhaoui has quit [Remote host closed the connection]
jtojnar has joined #nixos-dev
Profpatsch has joined #nixos-dev
kgz has joined #nixos-dev
makefu has joined #nixos-dev
delroth has joined #nixos-dev
yl has joined #nixos-dev
johanot has joined #nixos-dev
drakonis has joined #nixos-dev
ma27 has quit [Quit: WeeChat 2.4]
ma27 has joined #nixos-dev
orivej has quit [Ping timeout: 268 seconds]
asymmetric has quit [Quit: Leaving]
drakonis has quit [Quit: WeeChat 2.3]
drakonis has joined #nixos-dev
drakonis_ has quit [Ping timeout: 268 seconds]
johanot has quit [Quit: WeeChat 2.4]
drakonis_ has joined #nixos-dev
<manveru> well, most ruby executables would break
drakonis has quit [Ping timeout: 252 seconds]
<yorick> maybe there should be a new nix stable release where nix-shell works as root
<infinisil> nix-shell doesn't work as root??
<infinisil> Works for me
orivej has joined #nixos-dev
<infinisil> Huh, just looking at this failure in hydra: https://hydra.nixos.org/build/90472195/nixlog/2/tail
<infinisil> A hash mismatch? How come? It's not a fixed output derivation
<infinisil> And it's cached by hydra
drakonis has joined #nixos-dev
drakonis_ has quit [Ping timeout: 272 seconds]
drakonis has quit [Ping timeout: 252 seconds]
<samueldr> if I was a betting man, I'd bet it's the same issue still present in tests, where sometimes it looks like the VM is mildly corrupt
<samueldr> (only from an external look though, no idea if the gut feeling is right)
<samueldr> failure verifying the installed system: https://hydra.nixos.org/build/89851849/nixlog/124
<samueldr> haven't found one quickly, but sometimes grub just hangs or fails in weird ways
<samueldr> all in transient ways
<samueldr> the "debugging hatch" may not have been the underlying issue, but only somehow causing it to happen more often
<samueldr> (if it was, nothing was validated in term of numbers)
<ekleog> did we ever see some weird failure like these ones on !packet-epyc?
<ekleog> I get the feeling every time I'm seeing some weird transient failure it's on packet-epyc
<ekleog> sure, it's getting most of the jobs for being such a horse, but…
<samueldr> it was something I brought on when testing earlier this year, maybe it only looks like it since epyc always gets to many tasks
<ekleog> at this stage I'm starting to think it may make sense to consider hardware failuse
<ekleog> failure*
drakonis has joined #nixos-dev
<samueldr> memtest was ran, successfully, if it is, it's something else :/
<ekleog> an overclocked processor that sometimes fails a bit?
<samueldr> (or something memtest won't exhibit)
<samueldr> no idea if packet's machine would overclock
<ekleog> (even if not overclocked, a processor misprint could lead to the same kind of behavior I think)
<samueldr> yep
<samueldr> sounds legit at least
<ekleog> it's supposed to be caught at test time in the factory, but…
<samueldr> from the outside, looking at when it fails, never the machine looks overloaded or anything in grafana
<samueldr> (I have no more insight in this than what's known publicly here)
<ekleog> I I wonder if there's some software that basically tests all|most instructions and checks their result
<ekleog> could find some CPU stress testing stuff but it looks more designed for bench than for checking the processor
copumpkin has quit [Ping timeout: 245 seconds]
<ekleog> it sounds like what people do when overclocking is either using some intel software (packet-epyc-1 is arm, though, iirc) or just use something like prime65 and see whether an error occurs within the OS during the stress time
<delroth> manveru: hmm, why would ruby break?
<ekleog> oh looks like prime65 is actually checking the results https://www.mersenne.org/download/stress.txt
<ekleog> -> gchristensen, would it be possible to try to run prime65 on packet-epyc-1 for like 1-2 days, to check whether the CPU looks OK?
<infinisil> samueldr: I think I'll just have to stop trying to look into weird failures and just restart to make sure it's not transient
<infinisil> Considering how often this happens, I feel like it might really be worth doing every build twice to fix this
<infinisil> Because it saves all of us time and makes the channels update faster
<samueldr> infinisil: try to keep a mental tally on which machine it built, if you can
<samueldr> (or a written tally if you want)
<infinisil> Alright
<samueldr> though you're now primed and probably biased against the epyc machine :)
* samueldr wonders if there are non-qemu things that fails on the epyc machine
<infinisil> I'll keep a tally :)
<manveru> delroth: i didn't mean ruby itself, but a lot of binaries from ruby gems use things like `require_relative '../lib/foo'`
<samueldr> the git-annex one the other day was on ike, tried reproducing using a non-kvm qemu with a same-generation cpu and it didn't look like I could
<samueldr> manveru: it would be likely then that a sibling directory to $out/bin would work?
<samueldr> (but then, what else would break?)
<manveru> samueldr: like $out/bin-real or something?
<samueldr> or $out/bin-wrapped, yeah
<samueldr> (well, wrappee?)
<manveru> hehe
<manveru> well, it'd be an interesting experiment either way, but it should work then, yeah
orivej has quit [Ping timeout: 246 seconds]
lassulus has quit [Ping timeout: 250 seconds]
<infinisil> So I restarted this one because it seemed sketchy: https://hydra.nixos.org/build/90304238
<infinisil> But it failed again
<infinisil> Still a weird error, this time not on a packet machine (t4a is the name), machine# switch_root: can't execute '/nix/store/2iar0ajsjpy8kgfyh108fcrzg0r097zw-nixos-system-machine-19.03pre-git/init': Operation not permitted
<samueldr> can you test locally (I will too)
yl has quit [Ping timeout: 255 seconds]
* samueldr hoists a "?" in the above sentence
<infinisil> I'll do too
<samueldr> oh, it never successfully ran
<samueldr> (for 19.03)
<infinisil> Huh
<infinisil> That totally looks like a corruption error though
<infinisil> Or something weird
<samueldr> when init can't be ran (for any reason) the kernel segfaults
<samueldr> well, when init exits*
<samueldr> init exits since it fails to run init
<samueldr> (stage-1 init exits, since it fails to `exec` stage-2's init)
<infinisil> Ah
<samueldr> might be a similar issue than the one stopping from updating the kernel to the current LTS
<samueldr> (you can stop the test if it hasn't finished for you, reproducible error)
<samueldr> aszlig: might be relevant to the overlayfs issue
<samueldr> probably not
<infinisil> The first faily line seems to be "kbd_mode: KDSKBMODE: Inappropriate ioctl for device"
<infinisil> Probably not it though
<samueldr> that's something that's been in the boot logs for a while (according to google)
<samueldr> though not sure what it means exactly
lassulus has joined #nixos-dev
<samueldr> lol, I went directly to the kernel source, which in hindsight I was wrong
<samueldr> >> kbd_mode - report or set the keyboard mode
<samueldr> most probably because there is no keyboard connected to the VM?
<infinisil> Ah that might be it
<samueldr> see the %G in the following line in the log and in the printf
<samueldr> chiiruno isn't an IRC peep, right?
<infinisil> printf "\033%%${if isUnicode then "G" else "@"}" >> /dev/console
<samueldr> (bcachefs test maintainer)
<samueldr> yeah, I was pointing out to it since it shows it _is_ the right thing you found with kbd_mode
<infinisil> The right thing?
<infinisil> Anyways, totally getting sidetracked here
<samueldr> ah, the right invocation of kbd_mode, so no, it's not what's at fault
<samueldr> I'm thinking bcachefs is just borked?
<samueldr> or uh, in combination with the 9p-backed testing infra
<infinisil> How do you even debug this!
<samueldr> since 18.09 is fine, I'd first update bcachefs in 18.09 to the same rev than in unstable (basically backport) and see if it sticks, then uh, think harder
<samueldr> there was irrelevant(?) breakage that makes it harder to bisect
<infinisil> A neat trick I've started using for bisects is to use this: https://paste.infinisil.com/maqmH3Hfjw
<samueldr> the failures in hydra on unstable differ starting with 88316873, which has #54752 in
<{^_^}> https://github.com/NixOS/nixpkgs/pull/54752 (by eadwu, 6 weeks ago, merged): linux_testing_bcachefs,bcachefs-tools: 20190123
lassulus_ has joined #nixos-dev
<infinisil> I guess if there's an unrelated breakage during bisecting, you can always first try to fix that failure first
<samueldr> neat snippet infinisil++ for bisecting
<{^_^}> infinisil's karma got increased to 61
<infinisil> :)
lassulus has quit [Ping timeout: 272 seconds]
lassulus_ is now known as lassulus
<infinisil> Combine that with a cache lookup to immediately exit 0 for ones that are already built, and combine that with a flag to control whether you want to bisect over stdenvs that you have to rebuild (by trying to build -A stdenv), and you got a very neat bisection script
<infinisil> *And* combine that with **recursive** bisection when it has been determined that a version update causes the breakage!
<infinisil> (well, not recursive, because it would probably only be for 1 level)
<samueldr> I don't think we can easily "just" yank out the 9p parts out of the tests infra, right?
<infinisil> samueldr: What's 9p for?
<samueldr> mounting the host's /nix/store (or part of?) in the VM
<infinisil> Oh that makes sense yeah
<samueldr> it's used with overlayfs and it is causing grief with the current recent kernels
<infinisil> samueldr: You're saying overlayfs itself depends on 9p?
<samueldr> no, sorry I was a bit brief
<samueldr> overlayFS is used to merge multiple FS in a single one, and here overlayFS is used to merge a writable store on top of the shared system's store, so it uses the mounted 9p overlayfs
<samueldr> AGH, sorry, "it uses the mounted 9p fs"
<infinisil> Ahh I see, more or less
<samueldr> and *something* changed into the mounting bits or overlayfs bits in the kernel somewhen between the LTSes and is why we're not upgrading to the next LTS yet
<samueldr> (our tests don't run)
drakonis1 has joined #nixos-dev