#nix-darwin on 2018-12-21

00:03 philr has quit [Ping timeout: 246 seconds]

00:05 the-kenny has joined #nix-darwin

00:13 philr has joined #nix-darwin

00:46 philr has quit [Ping timeout: 250 seconds]

02:06 hamishmack has quit [Ping timeout: 272 seconds]

07:21 philr has joined #nix-darwin

07:46 trcc has joined #nix-darwin

07:47 trcc has quit [Remote host closed the connection]

07:47 trcc has joined #nix-darwin

10:28 trcc has quit [Remote host closed the connection]

10:29 trcc has joined #nix-darwin

10:34 trcc has quit [Ping timeout: 268 seconds]

11:01 trcc has joined #nix-darwin

11:06 trcc has quit [Ping timeout: 244 seconds]

11:24 periklis has joined #nix-darwin

11:29 periklis has quit [Remote host closed the connection]

11:48 trcc has joined #nix-darwin

11:53 trcc has quit [Ping timeout: 246 seconds]

13:01 trcc has joined #nix-darwin

13:15 periklis has joined #nix-darwin

13:31 __Sander__ has joined #nix-darwin

13:40 trcc has quit []

14:13 periklis has quit [Remote host closed the connection]

14:32 periklis has joined #nix-darwin

15:33 periklis has quit [Remote host closed the connection]

15:38 periklis has joined #nix-darwin

15:54 periklis has quit [Ping timeout: 250 seconds]

16:36 philr has quit [Ping timeout: 246 seconds]

16:57 __Sander__ has quit [Quit: Konversation terminated!]

17:38 <contrapumpkin> LnL: I think I see what happened

17:38 <contrapumpkin> given your latest trace

17:39 <LnL> oh really? I'm kind of lost

17:39 <contrapumpkin> so

17:39 <contrapumpkin> libinfo source is public on opensource.apple.com

17:39 <contrapumpkin> if you compare 517.30.1 (10.13.6) to 517.200.9 (10.14.1)

17:39 <contrapumpkin> si_getaddrinfo.c

17:40 <contrapumpkin> there's a new big chunk about loading libnetwork.dylib

17:40 <LnL> yeah we also have a build for it IIRC

17:40 <contrapumpkin> it simply wasn't there before

17:40 <contrapumpkin> in this case I think we're not using that though

17:40 <contrapumpkin> this is just the one in libsystem

17:41 <LnL> ok, but how does that explain I have both a working and non working version on this machine?

17:41 <LnL> I'd expect them both to go through the same path

17:42 <contrapumpkin> I think it depends on cache state somehow

17:42 <contrapumpkin> if you look at the actual source for si_addrinfo

17:42 <contrapumpkin> the call to the nat64 code is fairly low down with some early exits out

17:42 <contrapumpkin> not 100% sure though

17:43 <contrapumpkin> I'd be curious if you try to compile Nix on darwin without using Nix whether the issue still happens

17:43 <contrapumpkin> like look at all the ways in which gai_nat64_synthesis can early exit before getting to the point of actually loading libnetwork

17:44 <contrapumpkin> if family != AF_INET6 it'll early return without loading the dylib

17:44 <contrapumpkin> among many others

17:45 <contrapumpkin> so anyway, not really an exact answer, but that seems to be the gist of it

17:46 <LnL> hmm

17:46 <contrapumpkin> tbc, I don't see a clear way to prevent it from happening

17:46 <contrapumpkin> however, if it does happen outside of Nix, we might have a case for a radar

17:46 <contrapumpkin> like, if anyone who calls fork and resolves DNS names to ipv6 gets that failure

17:47 <contrapumpkin> that's a pretty big bug on Apple's side

17:47 <LnL> hold on, does this mean it might work fine if you're on an ipv4 only network?

17:47 <contrapumpkin> I haven't traced the logic exactly, but perhaps? or it could have something to do with what a DNS resolution gives you

17:47 <contrapumpkin> and you might still get AAAA records even if your network doesn't speak v6

17:48 <LnL> right

17:51 <LnL> btw, dtrace is crazy it spewed out so much stuff that nix-store -r takes min to get to the error

17:51 <LnL> 2min*

17:57 <LnL> I can try to make an example program

18:50 hamishmack has joined #nix-darwin

19:02 <LnL> contrapumpkin: interesting, it depends on the domain

19:18 <contrapumpkin> oh?

19:19 <LnL> https://gist.github.com/LnL7/5c43975144c0a9a0f5de74cc155a01d5

19:19 <LnL> ^ dns with vs without an AAAA record

19:20 <contrapumpkin> fun!

19:20 <contrapumpkin> so if you just call fork in that program

19:20 <contrapumpkin> can you make it crash if it runs against AAAA

19:20 <contrapumpkin> that seems undesirable if they want to be posix-friendly

19:21 <contrapumpkin> I assume this has something to do with cache.nixos.org being ipv6-friendly now?

19:21 <LnL> it's the other way around

19:36 <contrapumpkin> oh?

19:38 <LnL> apple.com is ipv4 only

19:39 <LnL> and I can't seem to reproduce the fork issue

19:42 <contrapumpkin> I think it would boil down to:

19:42 <contrapumpkin> fork(), then resolve a DNS name from the child, causing the dlopen to happen

19:42 <contrapumpkin> and objc initialization to occur

19:43 <contrapumpkin> it seems to look at dyld_get_program_sdk_version

19:44 <contrapumpkin> if we can just get that to say something below 10.13 I think we get left alone

19:44 <contrapumpkin> or we can make sure there's a __objc_fork_ok section in the executable

19:44 <contrapumpkin> judging by https://opensource.apple.com/source/objc4/objc4-723/runtime/objc-os.mm.auto.html

19:46 <LnL> that's what I tried

19:46 <contrapumpkin> is your executable built for 10.13 or higher?

19:47 <contrapumpkin> you can probably just call `uint32_t dyld_get_program_sdk_version();` and see what it gives you

19:48 <LnL> oh, this is outside of nix so probably

19:49 <contrapumpkin> judging by the objc code, it seems like this should only matter if it's built for 10.13 or higher

19:49 <contrapumpkin> is our nix-daemon build somehow targeting that?

19:50 <LnL> it shouldn't be, we set MACOSX_DEPLOYMENT_TARGET

19:54 <LnL> prints the same version, but it doesn't seem to resolve properly anymroe

19:55 <LnL> unless my code is wrong, not unlikely

20:17 <contrapumpkin> hmm not sure, sorry

20:17 <contrapumpkin> about to leave on a 12-hour drive

20:17 <contrapumpkin> but should have a bit more free time over the next week or two

20:17 <contrapumpkin> can probably help then :)

20:20 <LnL> printing the ipv6 address is probably just wrong

20:21 <LnL> but it clearly reproduce some unexpected behaviour

20:21 <LnL> it could be that the crash only happens in some specific conditions

20:29 <contrapumpkin> well apple's notoriously unfriendly to forks in their platform libs

20:29 <contrapumpkin> like I think CF will just crash if you try to fork after loading it

20:29 <contrapumpkin> so it wouldn't surprise me if they had similar logic in libobjc

20:30 <contrapumpkin> even ignoring dlopen and libnetwork though

20:30 <contrapumpkin> just based on objc-os.mm

20:30 <contrapumpkin> I'd expect that crash to only occur in earlier platform binaries

20:30 <contrapumpkin> I mean ones compiled for 10.13 or above

20:30 <contrapumpkin> which we don't generally do, so it's confusing

20:36 contrapumpkin has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

22:32 philr has joined #nix-darwin