philr has quit [Ping timeout: 246 seconds]
the-kenny has joined #nix-darwin
philr has joined #nix-darwin
philr has quit [Ping timeout: 250 seconds]
hamishmack has quit [Ping timeout: 272 seconds]
philr has joined #nix-darwin
trcc has joined #nix-darwin
trcc has quit [Remote host closed the connection]
trcc has joined #nix-darwin
trcc has quit [Remote host closed the connection]
trcc has joined #nix-darwin
trcc has quit [Ping timeout: 268 seconds]
trcc has joined #nix-darwin
trcc has quit [Ping timeout: 244 seconds]
periklis has joined #nix-darwin
periklis has quit [Remote host closed the connection]
trcc has joined #nix-darwin
trcc has quit [Ping timeout: 246 seconds]
trcc has joined #nix-darwin
periklis has joined #nix-darwin
__Sander__ has joined #nix-darwin
periklis has quit [Remote host closed the connection]
periklis has joined #nix-darwin
periklis has quit [Remote host closed the connection]
periklis has joined #nix-darwin
periklis has quit [Ping timeout: 250 seconds]
philr has quit [Ping timeout: 246 seconds]
__Sander__ has quit [Quit: Konversation terminated!]
<
contrapumpkin>
LnL: I think I see what happened
<
contrapumpkin>
given your latest trace
<
LnL>
oh really? I'm kind of lost
<
contrapumpkin>
libinfo source is public on opensource.apple.com
<
contrapumpkin>
if you compare 517.30.1 (10.13.6) to 517.200.9 (10.14.1)
<
contrapumpkin>
si_getaddrinfo.c
<
contrapumpkin>
there's a new big chunk about loading libnetwork.dylib
<
LnL>
yeah we also have a build for it IIRC
<
contrapumpkin>
it simply wasn't there before
<
contrapumpkin>
in this case I think we're not using that though
<
contrapumpkin>
this is just the one in libsystem
<
LnL>
ok, but how does that explain I have both a working and non working version on this machine?
<
LnL>
I'd expect them both to go through the same path
<
contrapumpkin>
I think it depends on cache state somehow
<
contrapumpkin>
if you look at the actual source for si_addrinfo
<
contrapumpkin>
the call to the nat64 code is fairly low down with some early exits out
<
contrapumpkin>
not 100% sure though
<
contrapumpkin>
I'd be curious if you try to compile Nix on darwin without using Nix whether the issue still happens
<
contrapumpkin>
like look at all the ways in which gai_nat64_synthesis can early exit before getting to the point of actually loading libnetwork
<
contrapumpkin>
if family != AF_INET6 it'll early return without loading the dylib
<
contrapumpkin>
among many others
<
contrapumpkin>
so anyway, not really an exact answer, but that seems to be the gist of it
<
contrapumpkin>
tbc, I don't see a clear way to prevent it from happening
<
contrapumpkin>
however, if it does happen outside of Nix, we might have a case for a radar
<
contrapumpkin>
like, if anyone who calls fork and resolves DNS names to ipv6 gets that failure
<
contrapumpkin>
that's a pretty big bug on Apple's side
<
LnL>
hold on, does this mean it might work fine if you're on an ipv4 only network?
<
contrapumpkin>
I haven't traced the logic exactly, but perhaps? or it could have something to do with what a DNS resolution gives you
<
contrapumpkin>
and you might still get AAAA records even if your network doesn't speak v6
<
LnL>
btw, dtrace is crazy it spewed out so much stuff that nix-store -r takes min to get to the error
<
LnL>
I can try to make an example program
hamishmack has joined #nix-darwin
<
LnL>
contrapumpkin: interesting, it depends on the domain
<
contrapumpkin>
oh?
<
LnL>
^ dns with vs without an AAAA record
<
contrapumpkin>
fun!
<
contrapumpkin>
so if you just call fork in that program
<
contrapumpkin>
can you make it crash if it runs against AAAA
<
contrapumpkin>
that seems undesirable if they want to be posix-friendly
<
contrapumpkin>
I assume this has something to do with cache.nixos.org being ipv6-friendly now?
<
LnL>
it's the other way around
<
contrapumpkin>
oh?
<
LnL>
apple.com is ipv4 only
<
LnL>
and I can't seem to reproduce the fork issue
<
contrapumpkin>
I think it would boil down to:
<
contrapumpkin>
fork(), then resolve a DNS name from the child, causing the dlopen to happen
<
contrapumpkin>
and objc initialization to occur
<
contrapumpkin>
it seems to look at dyld_get_program_sdk_version
<
contrapumpkin>
if we can just get that to say something below 10.13 I think we get left alone
<
contrapumpkin>
or we can make sure there's a __objc_fork_ok section in the executable
<
LnL>
that's what I tried
<
contrapumpkin>
is your executable built for 10.13 or higher?
<
contrapumpkin>
you can probably just call `uint32_t dyld_get_program_sdk_version();` and see what it gives you
<
LnL>
oh, this is outside of nix so probably
<
contrapumpkin>
judging by the objc code, it seems like this should only matter if it's built for 10.13 or higher
<
contrapumpkin>
is our nix-daemon build somehow targeting that?
<
LnL>
it shouldn't be, we set MACOSX_DEPLOYMENT_TARGET
<
LnL>
prints the same version, but it doesn't seem to resolve properly anymroe
<
LnL>
unless my code is wrong, not unlikely
<
contrapumpkin>
hmm not sure, sorry
<
contrapumpkin>
about to leave on a 12-hour drive
<
contrapumpkin>
but should have a bit more free time over the next week or two
<
contrapumpkin>
can probably help then :)
<
LnL>
printing the ipv6 address is probably just wrong
<
LnL>
but it clearly reproduce some unexpected behaviour
<
LnL>
it could be that the crash only happens in some specific conditions
<
contrapumpkin>
well apple's notoriously unfriendly to forks in their platform libs
<
contrapumpkin>
like I think CF will just crash if you try to fork after loading it
<
contrapumpkin>
so it wouldn't surprise me if they had similar logic in libobjc
<
contrapumpkin>
even ignoring dlopen and libnetwork though
<
contrapumpkin>
just based on objc-os.mm
<
contrapumpkin>
I'd expect that crash to only occur in earlier platform binaries
<
contrapumpkin>
I mean ones compiled for 10.13 or above
<
contrapumpkin>
which we don't generally do, so it's confusing
contrapumpkin has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
philr has joined #nix-darwin