pie__ has quit [Remote host closed the connection]
pie__ has joined #nixos-dev
pie_ has quit [Ping timeout: 245 seconds]
alp has quit [Ping timeout: 250 seconds]
phreedom has joined #nixos-dev
phreedom_ has quit [Ping timeout: 256 seconds]
alp has joined #nixos-dev
drakonis has joined #nixos-dev
drakonis2 has quit [Ping timeout: 244 seconds]
drakonis_ has joined #nixos-dev
<teto>
I am investigating a crash on nix master. Followed the hacking guide to install from a nix-shell yet `make installcheck` misses libsodium and upon launch I get `/nix/store/2kcrj1ksd2a14bm5sky182fv2xwfhfap-glibc-2.26-131/lib/libc.so.6: version `GLIBC_2.27' not found`.
<delroth>
I'm tired of seeing .wrapped-* in my top output
<delroth>
(.*-wrapped rather -- point still stands)
asymmetric has joined #nixos-dev
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 245 seconds]
orivej has joined #nixos-dev
ma27 has quit [Quit: WeeChat 2.2]
johanot has quit [Quit: WeeChat 2.2]
ma27 has joined #nixos-dev
<samueldr>
delroth: AFAIUI some software will rely on the location of the binary (the wrapped one), a reason for which I discredited moving binaries to another folder wholesale
<samueldr>
when they'll do that silly stuff they'll be at the wrong location
<samueldr>
but I might have overestimated the issue when thinking about it initially
<samueldr>
when I thought about something similar, I was thinking to a folder sibling to $out/bin though
<delroth>
yeah that's my main concern as well, I'm trying to figure out how much exactly is impacted
<delroth>
there are quite a few places not using wrapProgram atm in nixpkgs because they do need to keep the basename
<delroth>
I've seen the hadoop package is doing that for example
<delroth>
not sure about what the right side of the tradeoff is -- one side is easy to measure since we can look at derivations that don't use wrapProgram because of basename issues, but finding derivations that would break with wrapProgram putting in a different directory is hard right now
<samueldr>
a compromise may be to use a parameter?
<samueldr>
haven't found one quickly, but sometimes grub just hangs or fails in weird ways
<samueldr>
all in transient ways
<samueldr>
the "debugging hatch" may not have been the underlying issue, but only somehow causing it to happen more often
<samueldr>
(if it was, nothing was validated in term of numbers)
<ekleog>
did we ever see some weird failure like these ones on !packet-epyc?
<ekleog>
I get the feeling every time I'm seeing some weird transient failure it's on packet-epyc
<ekleog>
sure, it's getting most of the jobs for being such a horse, but…
<samueldr>
it was something I brought on when testing earlier this year, maybe it only looks like it since epyc always gets to many tasks
<ekleog>
at this stage I'm starting to think it may make sense to consider hardware failuse
<ekleog>
failure*
drakonis has joined #nixos-dev
<samueldr>
memtest was ran, successfully, if it is, it's something else :/
<ekleog>
an overclocked processor that sometimes fails a bit?
<samueldr>
(or something memtest won't exhibit)
<samueldr>
no idea if packet's machine would overclock
<ekleog>
(even if not overclocked, a processor misprint could lead to the same kind of behavior I think)
<samueldr>
yep
<samueldr>
sounds legit at least
<ekleog>
it's supposed to be caught at test time in the factory, but…
<samueldr>
from the outside, looking at when it fails, never the machine looks overloaded or anything in grafana
<samueldr>
(I have no more insight in this than what's known publicly here)
<ekleog>
I I wonder if there's some software that basically tests all|most instructions and checks their result
<ekleog>
could find some CPU stress testing stuff but it looks more designed for bench than for checking the processor
copumpkin has quit [Ping timeout: 245 seconds]
<ekleog>
it sounds like what people do when overclocking is either using some intel software (packet-epyc-1 is arm, though, iirc) or just use something like prime65 and see whether an error occurs within the OS during the stress time
<ekleog>
-> gchristensen, would it be possible to try to run prime65 on packet-epyc-1 for like 1-2 days, to check whether the CPU looks OK?
<infinisil>
samueldr: I think I'll just have to stop trying to look into weird failures and just restart to make sure it's not transient
<infinisil>
Considering how often this happens, I feel like it might really be worth doing every build twice to fix this
<infinisil>
Because it saves all of us time and makes the channels update faster
<samueldr>
infinisil: try to keep a mental tally on which machine it built, if you can
<samueldr>
(or a written tally if you want)
<infinisil>
Alright
<samueldr>
though you're now primed and probably biased against the epyc machine :)
* samueldr
wonders if there are non-qemu things that fails on the epyc machine
<infinisil>
I'll keep a tally :)
<manveru>
delroth: i didn't mean ruby itself, but a lot of binaries from ruby gems use things like `require_relative '../lib/foo'`
<samueldr>
the git-annex one the other day was on ike, tried reproducing using a non-kvm qemu with a same-generation cpu and it didn't look like I could
<samueldr>
manveru: it would be likely then that a sibling directory to $out/bin would work?
<samueldr>
(but then, what else would break?)
<manveru>
samueldr: like $out/bin-real or something?
<samueldr>
or $out/bin-wrapped, yeah
<samueldr>
(well, wrappee?)
<manveru>
hehe
<manveru>
well, it'd be an interesting experiment either way, but it should work then, yeah
<infinisil>
Still a weird error, this time not on a packet machine (t4a is the name), machine# switch_root: can't execute '/nix/store/2iar0ajsjpy8kgfyh108fcrzg0r097zw-nixos-system-machine-19.03pre-git/init': Operation not permitted
<samueldr>
lol, I went directly to the kernel source, which in hindsight I was wrong
<samueldr>
>> kbd_mode - report or set the keyboard mode
<samueldr>
most probably because there is no keyboard connected to the VM?
<infinisil>
Ah that might be it
<samueldr>
see the %G in the following line in the log and in the printf
<samueldr>
chiiruno isn't an IRC peep, right?
<infinisil>
printf "\033%%${if isUnicode then "G" else "@"}" >> /dev/console
<samueldr>
(bcachefs test maintainer)
<samueldr>
yeah, I was pointing out to it since it shows it _is_ the right thing you found with kbd_mode
<infinisil>
The right thing?
<infinisil>
Anyways, totally getting sidetracked here
<samueldr>
ah, the right invocation of kbd_mode, so no, it's not what's at fault
<samueldr>
I'm thinking bcachefs is just borked?
<samueldr>
or uh, in combination with the 9p-backed testing infra
<infinisil>
How do you even debug this!
<samueldr>
since 18.09 is fine, I'd first update bcachefs in 18.09 to the same rev than in unstable (basically backport) and see if it sticks, then uh, think harder
<samueldr>
there was irrelevant(?) breakage that makes it harder to bisect
<infinisil>
I guess if there's an unrelated breakage during bisecting, you can always first try to fix that failure first
<samueldr>
neat snippet infinisil++ for bisecting
<{^_^}>
infinisil's karma got increased to 61
<infinisil>
:)
lassulus has quit [Ping timeout: 272 seconds]
lassulus_ is now known as lassulus
<infinisil>
Combine that with a cache lookup to immediately exit 0 for ones that are already built, and combine that with a flag to control whether you want to bisect over stdenvs that you have to rebuild (by trying to build -A stdenv), and you got a very neat bisection script
<infinisil>
*And* combine that with **recursive** bisection when it has been determined that a version update causes the breakage!
<infinisil>
(well, not recursive, because it would probably only be for 1 level)
<samueldr>
I don't think we can easily "just" yank out the 9p parts out of the tests infra, right?
<infinisil>
samueldr: What's 9p for?
<samueldr>
mounting the host's /nix/store (or part of?) in the VM
<infinisil>
Oh that makes sense yeah
<samueldr>
it's used with overlayfs and it is causing grief with the current recent kernels
<infinisil>
samueldr: You're saying overlayfs itself depends on 9p?
<samueldr>
no, sorry I was a bit brief
<samueldr>
overlayFS is used to merge multiple FS in a single one, and here overlayFS is used to merge a writable store on top of the shared system's store, so it uses the mounted 9p overlayfs
<samueldr>
AGH, sorry, "it uses the mounted 9p fs"
<infinisil>
Ahh I see, more or less
<samueldr>
and *something* changed into the mounting bits or overlayfs bits in the kernel somewhen between the LTSes and is why we're not upgrading to the next LTS yet