philr has quit [Ping timeout: 246 seconds]
the-kenny has joined #nix-darwin
philr has joined #nix-darwin
philr has quit [Ping timeout: 250 seconds]
hamishmack has quit [Ping timeout: 272 seconds]
philr has joined #nix-darwin
trcc has joined #nix-darwin
trcc has quit [Remote host closed the connection]
trcc has joined #nix-darwin
trcc has quit [Remote host closed the connection]
trcc has joined #nix-darwin
trcc has quit [Ping timeout: 268 seconds]
trcc has joined #nix-darwin
trcc has quit [Ping timeout: 244 seconds]
periklis has joined #nix-darwin
periklis has quit [Remote host closed the connection]
trcc has joined #nix-darwin
trcc has quit [Ping timeout: 246 seconds]
trcc has joined #nix-darwin
periklis has joined #nix-darwin
__Sander__ has joined #nix-darwin
trcc has quit []
periklis has quit [Remote host closed the connection]
periklis has joined #nix-darwin
periklis has quit [Remote host closed the connection]
periklis has joined #nix-darwin
periklis has quit [Ping timeout: 250 seconds]
philr has quit [Ping timeout: 246 seconds]
__Sander__ has quit [Quit: Konversation terminated!]
<contrapumpkin> LnL: I think I see what happened
<contrapumpkin> given your latest trace
<LnL> oh really? I'm kind of lost
<contrapumpkin> so
<contrapumpkin> libinfo source is public on opensource.apple.com
<contrapumpkin> if you compare 517.30.1 (10.13.6) to 517.200.9 (10.14.1)
<contrapumpkin> si_getaddrinfo.c
<contrapumpkin> there's a new big chunk about loading libnetwork.dylib
<LnL> yeah we also have a build for it IIRC
<contrapumpkin> it simply wasn't there before
<contrapumpkin> in this case I think we're not using that though
<contrapumpkin> this is just the one in libsystem
<LnL> ok, but how does that explain I have both a working and non working version on this machine?
<LnL> I'd expect them both to go through the same path
<contrapumpkin> I think it depends on cache state somehow
<contrapumpkin> if you look at the actual source for si_addrinfo
<contrapumpkin> the call to the nat64 code is fairly low down with some early exits out
<contrapumpkin> not 100% sure though
<contrapumpkin> I'd be curious if you try to compile Nix on darwin without using Nix whether the issue still happens
<contrapumpkin> like look at all the ways in which gai_nat64_synthesis can early exit before getting to the point of actually loading libnetwork
<contrapumpkin> if family != AF_INET6 it'll early return without loading the dylib
<contrapumpkin> among many others
<contrapumpkin> so anyway, not really an exact answer, but that seems to be the gist of it
<LnL> hmm
<contrapumpkin> tbc, I don't see a clear way to prevent it from happening
<contrapumpkin> however, if it does happen outside of Nix, we might have a case for a radar
<contrapumpkin> like, if anyone who calls fork and resolves DNS names to ipv6 gets that failure
<contrapumpkin> that's a pretty big bug on Apple's side
<LnL> hold on, does this mean it might work fine if you're on an ipv4 only network?
<contrapumpkin> I haven't traced the logic exactly, but perhaps? or it could have something to do with what a DNS resolution gives you
<contrapumpkin> and you might still get AAAA records even if your network doesn't speak v6
<LnL> right
<LnL> btw, dtrace is crazy it spewed out so much stuff that nix-store -r takes min to get to the error
<LnL> 2min*
<LnL> I can try to make an example program
hamishmack has joined #nix-darwin
<LnL> contrapumpkin: interesting, it depends on the domain
<contrapumpkin> oh?
<LnL> ^ dns with vs without an AAAA record
<contrapumpkin> fun!
<contrapumpkin> so if you just call fork in that program
<contrapumpkin> can you make it crash if it runs against AAAA
<contrapumpkin> that seems undesirable if they want to be posix-friendly
<contrapumpkin> I assume this has something to do with cache.nixos.org being ipv6-friendly now?
<LnL> it's the other way around
<contrapumpkin> oh?
<LnL> apple.com is ipv4 only
<LnL> and I can't seem to reproduce the fork issue
<contrapumpkin> I think it would boil down to:
<contrapumpkin> fork(), then resolve a DNS name from the child, causing the dlopen to happen
<contrapumpkin> and objc initialization to occur
<contrapumpkin> it seems to look at dyld_get_program_sdk_version
<contrapumpkin> if we can just get that to say something below 10.13 I think we get left alone
<contrapumpkin> or we can make sure there's a __objc_fork_ok section in the executable
<LnL> that's what I tried
<contrapumpkin> is your executable built for 10.13 or higher?
<contrapumpkin> you can probably just call `uint32_t dyld_get_program_sdk_version();` and see what it gives you
<LnL> oh, this is outside of nix so probably
<contrapumpkin> judging by the objc code, it seems like this should only matter if it's built for 10.13 or higher
<contrapumpkin> is our nix-daemon build somehow targeting that?
<LnL> it shouldn't be, we set MACOSX_DEPLOYMENT_TARGET
<LnL> prints the same version, but it doesn't seem to resolve properly anymroe
<LnL> unless my code is wrong, not unlikely
<contrapumpkin> hmm not sure, sorry
<contrapumpkin> about to leave on a 12-hour drive
<contrapumpkin> but should have a bit more free time over the next week or two
<contrapumpkin> can probably help then :)
<LnL> printing the ipv6 address is probably just wrong
<LnL> but it clearly reproduce some unexpected behaviour
<LnL> it could be that the crash only happens in some specific conditions
<contrapumpkin> well apple's notoriously unfriendly to forks in their platform libs
<contrapumpkin> like I think CF will just crash if you try to fork after loading it
<contrapumpkin> so it wouldn't surprise me if they had similar logic in libobjc
<contrapumpkin> even ignoring dlopen and libnetwork though
<contrapumpkin> just based on objc-os.mm
<contrapumpkin> I'd expect that crash to only occur in earlier platform binaries
<contrapumpkin> I mean ones compiled for 10.13 or above
<contrapumpkin> which we don't generally do, so it's confusing
contrapumpkin has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
philr has joined #nix-darwin