samueldr changed the topic of #nixops to: NixOps related talk | logs:
lordcirth__ has joined #nixops
<dhess> That very odd AWS permissions/missing narinfo issue I was having with the private S3 binary cache and `nixops deploy` goes away if I set `deployment.hasFastConnection = true` for the target hosts where the S3 binary cache is enabled.
<dhess> which kind of sucks because one of the reasons to have it enabled is so that the target hosts can pull from the nice, fast S3 binary cache rather than my deployment host using its relatively pitiful uplink.
<dhess> Pretty sure there's a bug in there somewhere.
<bhipple> dhess: just spent a long while reading an IRC conversation that you and gchristensen had a year ago, referenced here:
lordcirth_ has quit [Ping timeout: 240 seconds]
<{^_^}> #71770 (by Zachaccino, 20 weeks ago, closed): "Unable to find an etc directory with fstab" upon launching NixOS on EC2
<bhipple> did you ever get to the bottom of the fstab issues with AMI registration and the AMI upload script?
<bhipple> On your question, FWIW I am using that option with EC2 remote builders to connect to the public S3 binary cache without uploading binaries from my laptop, and it's working, so the bug must be in private binary caches
<bhipple> or it could be in the binary cache not being configured correctly or the user not being a trusted user and not having GPG sigs or something like that
<dhess> bhipple: re: the S3 cache, it's not a permissions issue. NixOps is trying to copy narinfo files that aren't in the cache yet.
<dhess> also the remote builders can read the private S3 cache just fine when they're building. This only happens during `nixops deploy` from a deployment host that also uses the same private S3 cache.
<bhipple> Ah I see, not sure I can be of help, just wanted to chime in that with the public S3 cache at least `nixops deploy` is using it for me
<dhess> I suspect what's happening is that NixOps assumes that the deployment host is writing the build products into the private S3 cache. But in my case, it's not, and that's intentional as I only want our Hydra to have the ability to do that, for trust reasons.
<dhess> bhipple: cool, thank you.
<dhess> ok re: this EC2 thing, I don't see the IRC conversation you refer to in that issue. Do you have a pointer to a log of the conversation?
<dhess> or perhaps it's not necessary. What I ended up doing was doing my own series of steps and not using from nixpkgs.
<bhipple> I'm also starting to go down the road of forking my own and figuring out how to build a custom NixOS AMI, but I suspect it's already well-tracked ground and the knowledge is out there somewhere
<bhipple> Original ticket author also went down that road
<bhipple> I wonder if there's somewhere in nixops-aws or nixos that we should document how this works?
<dhess> Unless something has changed since then, uses the deprecated ec2-tools (is that what it's called? It's something like that) stuff to create AMIs. I now use "aws ec2 import-snapshot" followed by "aws ec2 describe-snapshots" to get the SnapshotId, then "aws ec2 register-image"
<dhess> I haven't yet wrapped this up in a script so I do it by hand at the moment. I don't need to do it very often (yet -- I do want to transition to stateless images for most of my instances).
<bhipple> It looks like that is what create-amis is now using (the latest and greatest)
<dhess> oh I'll have to revisit it then
<bhipple> so someone must have resolved/refactored/updated it at some point
<dhess> but I haven't had that /etc problem since that IRC conversation , I don't think
<dhess> I did have a problem with nvme-based EC2 instances at one point though
<dhess> Are your instances nvme?
<bhipple> No, EBS; I saw the fstab problem when I tried to do things manually, but it "just worked" when I used the create-amis script, even though it appears to be calling the same cmds I was using more or less
<bhipple> at any rate, it's now working for me
<bhipple> Do you have any prior art/documentation/know-how for factoring out the input of what AMI to build?
<bhipple> e.g., I want to build my custom AMI instead of just NixOS on master
<dhess> oh ok. Yeah I had the problem until I switched to the sequence I quoted above.
<dhess> yeah
<dhess> hold on
<bhipple> I'm currently reading the code and will figure it out sooner or later, but wondering if there are docs somewhere
<dhess> Put this in an overlay, or adapt it to whatever way you like to define packages:
<dhess> I hacked that up a bit to remove some bits specific to our config, so it's not tested but it is more or less correct.
<dhess> that will create a qcow2 or vhd image depending on which attr you evaluate
<bhipple> Ok, so you then do something like `nix-build ec2-image.vhd`, and pass that to the upload script?
<dhess> "aws ec2 import-snapshot" wants VHD files, for Linux at least.
<dhess> yeah
<dhess> the (super.path + "/nixos/...") bits just use the tooling that already exists in nixpkgs
<dhess> so that it tracks changes to how nixpkgs builds them, so long as it doesn't add a new argument, change a path, etc.
<dhess> I've been doing that for about a year and I've only had to fix it once.
<bhipple> Slick, it just worked!
<bhipple> dhess++
<{^_^}> dhess's karma got increased to 7
<bhipple> You should do some refactoring of the base one on master to make it slightly more composable for others, as I think this is probably useful for more people than just you and me!
<dhess> cool, glad it helped.
<dhess> there are lots of things like that in nixpkgs that should be (exported) functions, but aren't.
<bhipple> got (possibly transient) EC2 error code 'MaxSpotInstanceCountExceeded': Max spot instance count exceeded. retrying...
<bhipple> ^ Has anyone seen this error before? I can launch a spot instance from the UI
<bhipple> and searching around there are not any limits on my AWS account that I'm hitting, as far as I can tell
<bhipple> so I suspect I'm missing something in my NixOps cfg and it's giving a misleading error msg
nuncanada2 has quit [Ping timeout: 240 seconds]
lordcirth__ has quit [Remote host closed the connection]
<sevanspowell> Hey a colleague attempting to run nixops on a non-NixOS machine is getting the following error. I don't have the same issue on my NixOS machine (we're both running from ostensibly the same shell environment).
<sevanspowell> Has anyone encountered something like this before?
abathur has quit [Ping timeout: 240 seconds]
bhipple has quit [Ping timeout: 256 seconds]
mogran has joined #nixops
mogran has left #nixops [#nixops]
syd has joined #nixops
abathur has joined #nixops
abathur has quit [Ping timeout: 246 seconds]
syd has quit [Remote host closed the connection]
nuncanada2 has joined #nixops
johnny101 has joined #nixops
abathur has joined #nixops
syd has joined #nixops
syd has quit [Remote host closed the connection]
abathur has quit [Ping timeout: 256 seconds]
abathur has joined #nixops
nuncanada2 has quit [Read error: Connection reset by peer]
nuncanada2 has joined #nixops
bhipple has joined #nixops