#nixops on 2020-03-16

2019-08-31 02:51 samueldr changed the topic of #nixops to: NixOps related talk | logs: https://logs.nix.samueldr.com/nixops/

00:04 lordcirth__ has joined #nixops

00:05 <dhess> That very odd AWS permissions/missing narinfo issue I was having with the private S3 binary cache and `nixops deploy` goes away if I set `deployment.hasFastConnection = true` for the target hosts where the S3 binary cache is enabled.

00:06 <dhess> which kind of sucks because one of the reasons to have it enabled is so that the target hosts can pull from the nice, fast S3 binary cache rather than my deployment host using its relatively pitiful uplink.

00:06 <dhess> Pretty sure there's a bug in there somewhere.

00:07 <bhipple> dhess: just spent a long while reading an IRC conversation that you and gchristensen had a year ago, referenced here: https://github.com/NixOS/nixpkgs/issues/71770

00:07 lordcirth_ has quit [Ping timeout: 240 seconds]

00:07 <{^_^}> #71770 (by Zachaccino, 20 weeks ago, closed): "Unable to find an etc directory with fstab" upon launching NixOS on EC2

00:07 <bhipple> did you ever get to the bottom of the fstab issues with AMI registration and the AMI upload script?

00:07 <bhipple> On your question, FWIW I am using that option with EC2 remote builders to connect to the public S3 binary cache without uploading binaries from my laptop, and it's working, so the bug must be in private binary caches

00:08 <bhipple> or it could be in the binary cache not being configured correctly or the user not being a trusted user and not having GPG sigs or something like that

00:17 <dhess> bhipple: re: the S3 cache, it's not a permissions issue. NixOps is trying to copy narinfo files that aren't in the cache yet.

00:19 <dhess> also the remote builders can read the private S3 cache just fine when they're building. This only happens during `nixops deploy` from a deployment host that also uses the same private S3 cache.

00:20 <bhipple> Ah I see, not sure I can be of help, just wanted to chime in that with the public S3 cache at least `nixops deploy` is using it for me

00:20 <dhess> I suspect what's happening is that NixOps assumes that the deployment host is writing the build products into the private S3 cache. But in my case, it's not, and that's intentional as I only want our Hydra to have the ability to do that, for trust reasons.

00:20 <dhess> bhipple: cool, thank you.

00:21 <dhess> ok re: this EC2 thing, I don't see the IRC conversation you refer to in that issue. Do you have a pointer to a log of the conversation?

00:22 <dhess> or perhaps it's not necessary. What I ended up doing was doing my own series of steps and not using create-amis.sh from nixpkgs.

00:25 <bhipple> https://logs.nix.samueldr.com/nixos-dev/2019-05-10

00:25 <bhipple> I'm also starting to go down the road of forking my own create-amis.sh and figuring out how to build a custom NixOS AMI, but I suspect it's already well-tracked ground and the knowledge is out there somewhere

00:25 <bhipple> Original ticket author also went down that road

00:26 <bhipple> I wonder if there's somewhere in nixops-aws or nixos that we should document how this works?

00:27 <dhess> Unless something has changed since then, create-amis.sh uses the deprecated ec2-tools (is that what it's called? It's something like that) stuff to create AMIs. I now use "aws ec2 import-snapshot" followed by "aws ec2 describe-snapshots" to get the SnapshotId, then "aws ec2 register-image"

00:29 <dhess> I haven't yet wrapped this up in a script so I do it by hand at the moment. I don't need to do it very often (yet -- I do want to transition to stateless images for most of my instances).

00:29 <bhipple> It looks like that is what create-amis is now using (the latest and greatest)

00:29 <dhess> oh I'll have to revisit it then

00:29 <bhipple> so someone must have resolved/refactored/updated it at some point

00:29 <dhess> but I haven't had that /etc problem since that IRC conversation , I don't think

00:30 <dhess> I did have a problem with nvme-based EC2 instances at one point though

00:30 <dhess> Are your instances nvme?

00:31 <bhipple> No, EBS; I saw the fstab problem when I tried to do things manually, but it "just worked" when I used the create-amis script, even though it appears to be calling the same cmds I was using more or less

00:31 <bhipple> at any rate, it's now working for me

00:31 <bhipple> Do you have any prior art/documentation/know-how for factoring out the input of what AMI to build?

00:32 <bhipple> e.g., I want to build my custom AMI instead of just NixOS on master

00:32 <dhess> oh ok. Yeah I had the problem until I switched to the sequence I quoted above.

00:32 <dhess> yeah

00:32 <dhess> hold on

00:32 <bhipple> I'm currently reading the code and will figure it out sooner or later, but wondering if there are docs somewhere

00:34 <dhess> Put this in an overlay, or adapt it to whatever way you like to define packages:

00:34 <dhess> https://gist.github.com/dhess/f5e356ddb2c8544b43d45cef9ca349de

00:35 <dhess> I hacked that up a bit to remove some bits specific to our config, so it's not tested but it is more or less correct.

00:35 <dhess> that will create a qcow2 or vhd image depending on which attr you evaluate

00:36 <bhipple> Ok, so you then do something like `nix-build ec2-image.vhd`, and pass that to the upload script?

00:36 <dhess> "aws ec2 import-snapshot" wants VHD files, for Linux at least.

00:36 <dhess> yeah

00:37 <dhess> the (super.path + "/nixos/...") bits just use the tooling that already exists in nixpkgs

00:38 <dhess> so that it tracks changes to how nixpkgs builds them, so long as it doesn't add a new argument, change a path, etc.

00:38 <dhess> I've been doing that for about a year and I've only had to fix it once.

00:41 <bhipple> Slick, it just worked!

00:41 <bhipple> dhess++

00:41 <{^_^}> dhess's karma got increased to 7

00:42 <bhipple> You should do some refactoring of the base one on master to make it slightly more composable for others, as I think this is probably useful for more people than just you and me!

00:58 <dhess> cool, glad it helped.

00:58 <dhess> there are lots of things like that in nixpkgs that should be (exported) functions, but aren't.

01:03 <bhipple> got (possibly transient) EC2 error code 'MaxSpotInstanceCountExceeded': Max spot instance count exceeded. retrying...

01:03 <bhipple> ^ Has anyone seen this error before? I can launch a spot instance from the UI

01:03 <bhipple> and searching around there are not any limits on my AWS account that I'm hitting, as far as I can tell

01:04 <bhipple> so I suspect I'm missing something in my NixOps cfg and it's giving a misleading error msg

02:45 nuncanada2 has quit [Ping timeout: 240 seconds]

02:56 lordcirth__ has quit [Remote host closed the connection]

04:51 <sevanspowell> Hey a colleague attempting to run nixops on a non-NixOS machine is getting the following error. I don't have the same issue on my NixOS machine (we're both running from ostensibly the same shell environment). https://www.irccloud.com/pastebin/Ac3spJRX/nix-plugins-undefined-symbol

04:51 <sevanspowell> Has anyone encountered something like this before?

05:05 abathur has quit [Ping timeout: 240 seconds]

05:54 bhipple has quit [Ping timeout: 256 seconds]

06:26 mogran has joined #nixops

06:27 mogran has left #nixops [#nixops]

10:23 syd has joined #nixops

11:02 abathur has joined #nixops

11:06 abathur has quit [Ping timeout: 246 seconds]

11:09 syd has quit [Remote host closed the connection]

11:39 nuncanada2 has joined #nixops

12:57 johnny101 has joined #nixops

14:08 abathur has joined #nixops

15:08 syd has joined #nixops

15:24 syd has quit [Remote host closed the connection]

17:47 abathur has quit [Ping timeout: 256 seconds]

18:14 abathur has joined #nixops

20:13 nuncanada2 has quit [Read error: Connection reset by peer]

20:13 nuncanada2 has joined #nixops

23:30 bhipple has joined #nixops