samueldr changed the topic of #nixops to: NixOps related talk | logs: https://logs.nix.samueldr.com/nixops/
lordcirth_ has quit [Remote host closed the connection]
abathur has quit [Ping timeout: 255 seconds]
abathur has joined #nixops
abathur has quit [Ping timeout: 265 seconds]
<dhess> clever: ever seen anything like what I described above? Do you use any private S3 nix caches at IOHK?
<clever> dhess: i believe the S3 caches are all public
<dhess> ok
<clever> dhess: but in theory, if ~/.aws has the right credentials, and you use an s3:// URI, it may work
<dhess> It does work when .aws has the right creds, the issue is that then NixOps tries to reference narinfo files that don't exist in the cache
<dhess> which is really perplexing.
<dhess> might have something to do with the Hydra and NARs
<dhess> I don't really understand how that all works.
<dhess> but what's frustrating is if I just disable the S3 nix cache *on the target* (not on the deployment host) it works fine
<dhess> so I don't get why enabling the S3 nix cache on the target suddenly means that the deployment host can't find narinfo files in the cache.
abathur has joined #nixops
abathur has quit [Ping timeout: 240 seconds]
<clever> dhess: did you garbage collect the s3 bucket?
<dhess> clever: no, not even once.
<clever> dhess: try deleting ~/.cache/nix/binary-cache* anyways and see if it helps
<dhess> clever: our hydra config looks like this (the hostname has been changed but that's it):
<dhess> clever: delete that in my home dir on the deployment host, you mean?
<clever> try nuking it on both your user, and root, on both machines
<clever> nix will re-create it any time its missing
<dhess> oh ok
<clever> dhess: looks like it should be fine
<dhess> clever: ok here is something that just occurred to me. I run this NixOps command on my deployment host. It's a Mac, so it builds the product on one of our remote NixOS builders, then copies it from there back to its own Nix store
<dhess> now of course the product isn't in the binary cache because neither the builder nor the Mac write products to the cache
<dhess> only the Hydra does
<dhess> so why is NixOps assuming it's there?
<clever> anything nix needs to build, will obviously fail to be found in the s3 bucket
<clever> what error is it giving exactly?
<dhess> error (ignored): AWS error fetching 'sg4m8f3qj1pgp4wigxw2grwbpzwzxgmw.narinfo': Access Denied
<dhess>
<dhess> in this particular case, about 4 of those errors, then it gives up
<dhess> and from my Mac:
<clever> that implies you dont have the right credentials in ~/.aws, on one of the machines
<dhess> aws s3 ls s3://hackworth-nix-cache/sg4m8f3qj1pgp4wigxw2grwbpzwzxgmw.narinfo
<dhess>
<dhess>
<dhess> echo $?
<dhess> 1
<dhess>
<dhess> it's not there
<clever> which machine actually gave the error? from which user?
<dhess> both of these are from the Mac
<clever> is the mac single or multi-user?
<dhess> I can see the nix-copy-closure command running:
<dhess> nix-copy-closure --to root@harry-b.example.com /nix/store/wb2j6w560kgwzc6slj321al7r93iq7gb-nixos-system-harry-b-20.09pre-git --use-substitutes
<dhess> that happens when I run the deploy
<dhess> both machines can read the S3 nix cache because when I build a derivation on them, nix says it's downloading products from the cache
<dhess> The Mac is multi-user
<dhess> the daemon has the creds
<clever> dhess: nix might attempt to query the cache from a non-root user, double-check that user also has keys
<dhess> all works fine during a nix-build and a nixops deploy, up to the part where it tries to copy the closures to the target host
<dhess> clever: right, that nix-copy-closure is running as my user
<dhess> I have the creds
<dhess> let me double check that
<dhess> $ aws s3 ls s3://hackworth-nix-cache/05sss8cvbyzdd8i85wvd2q9fvwhsf9z5.narinfo
<dhess> 2019-10-14 17:36:42 4042 05sss8cvbyzdd8i85wvd2q9fvwhsf9z5.narinfo
<dhess> yep, ran that as the same user that's doing the deploy
<dhess> worked fine
<dhess> so: those products aren't in the S3 nix cache
<dhess> but NixOps is assuming they are
<dhess> just to prove it, running as root on the same Mac:
<dhess> $ sudo -i aws s3 ls s3://hackworth-nix-cache/05sss8cvbyzdd8i85wvd2q9fvwhsf9z5.narinfo
<dhess> 2019-10-14 17:36:42 4042 05sss8cvbyzdd8i85wvd2q9fvwhsf9z5.narinfo
<dhess> it also has the creds
<clever> brb
<dhess> seems to me here the issue is as I summarized above: the builders (Mac & its remote NixOS build host) don't write to the S3 nix cache, only the Hydra does. But whenever I enable the S3 nix cache on a remote NixOS machine, NixOps for some reason starts looking in the S3 nix cache for narinfo files upon deployment, but they're not there.
<dhess> and in this case the *only* thing that has changed is that I've enabled the S3 nix cache on the target NixOS machine
<dhess> (the target NixOS machine is not the same as the remote NixOS build machine, though in the past I had the same issue when I tried to enable the S3 nix cache on the remote NixOS build machine.)
abathur has joined #nixops
abathur has quit [Ping timeout: 240 seconds]
aanderse has joined #nixops
Cadey has joined #nixops
<Cadey> how do i have nixops machines refer to other machines in the network?
<gchristensen> look for nodes.gollum.config on https://nixos.org/nixops/manual
<gchristensen> hbac kin 1h
kalbasit has joined #nixops
sevanspowell has joined #nixops
dmj` has quit [Remote host closed the connection]
dmj` has joined #nixops
davidtwco has joined #nixops