gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
orivej has quit [Ping timeout: 244 seconds]
kalbasit has quit [Quit: WeeChat 2.1]
<{^_^}> [ofborg] @grahamc pushed to bump-0.1.7 « fixup: tests »: https://git.io/fNyoI
<{^_^}> [ofborg] @grahamc pushed to bump-0.1.7 « Unique vars »: https://git.io/fNyoM
jtojnar has quit [Remote host closed the connection]
jtojnar has joined #nixos-borg
FRidh has joined #nixos-borg
<LnL> gchristensen: I'm not sure what's causing the test failure, the only issue I can reproduce is the <nix/config.nix> + sandboxing problem
orivej has joined #nixos-borg
<gchristensen> hrm :/
<gchristensen> annoying :)
<LnL> it does sound kind of familiar, don't remember what I ran into when adding the initial travis build
<LnL> btw, do you think it would be ok to make the tests depend on nixpkgs?
<LnL> the host nix stuff has some problems
orivej has quit [Ping timeout: 256 seconds]
NinjaTrappeur has quit [Quit: WeeChat 2.2]
NinjaTrappeur has joined #nixos-borg
timokau has joined #nixos-borg
timokau[m] has joined #nixos-borg
<gchristensen> oh should we change that?
<gchristensen> ideally we wouldn't depend upon nixpkgs but if it makes things a lot easier, sure
<gchristensen> where "a lot" is however much you want that to mean
<LnL> the problem is that it's brittle in combination with nix-daemon + sandboxing
<LnL> and depending on the system shell in <nix/config.nix> might not even be in the store
<gchristensen> oh right
<gchristensen> ok sure let's do that
<{^_^}> [ofborg] @grahamc pushed to bump-0.1.7 « Add debugging to the test scratch test »: https://git.io/fNSGo
<gchristensen> I wish travis was faster
<LnL> eg. stuff like this can happen if nix-daemon / client don't match exactly.
<LnL> while setting up the build environment: executing '/nix/store/zqh3l3lyw32q1ayb15bnvg9f24j5v2p0-bash-4.4-p12/bin/bash': No such file or directory
<LnL> when bash in <nix/config.nix> of the client is different from the one the nix-daemon knows about, so it's not allowed in the sandbox
<gchristensen> ouch
<{^_^}> [ofborg] @grahamc pushed to bump-0.1.7 « Split out the ofborg builds in to separate jobs »: https://git.io/fNSZc
<gchristensen> that seems weird, it isn't just noted as a build input or whatever?
<LnL> no it's a string
<LnL> you'd need to use builtins.storePath cfg.shell, but that only works if it's actually in the store
<gchristensen> ehhh
<gchristensen> let's start with that
<gchristensen> to get things passing and then we can make it better and move to nixpkgs
<LnL> oh, the tests don't do that...
<gchristensen> I'm adding that now
<LnL> yeah just realised by talking about it
<gchristensen> :D
<{^_^}> [ofborg] @grahamc pushed to bump-0.1.7 « builtins.storePath the configured nix shell »: https://git.io/fNSZb
<LnL> might break testing on darwin tho
<gchristensen> I was wondering why the heck it wasn't finding it, heh
<LnL> this is not related to travis btw, since those builds use a single user install
<gchristensen> yeah, but they were failing here :)
<gchristensen> and now they pass
<LnL> ah :)
<gchristensen> ah right
<gchristensen> look at all these assumptions
<LnL> until now it worked for me because of build-remote, but I think builtins.storePath breaks that
<gchristensen> :$
<gchristensen> how does nixpkgs do this stuff?
<gchristensen> (I don't actually need to know, we can just move to using nixpkgS!)
<LnL> nixpkgs only uses this stuff to unpack bootstrap tools
<gchristensen> which doesn't happen remotely?
<Dezgeg> i don't think nixpkgs uses it, it uses statically linked busybo
<gchristensen> hrm
<gchristensen> right
<LnL> it almost never happens because it's cached and doesn't change
<gchristensen> yeah
<gchristensen> so let's do it, I don't see a real disadvantage
<LnL> anyway, let's do builtins.storePath for now I can fix the platforms and/or use nixpkgs if necessary
<gchristensen> LnL: want to deploy this PR once it merges? (for 0.1.7)
<LnL> oh :D
<gchristensen> I mean, if you don't want to =)
<LnL> no let's, not sure what I'd have to do tho
<gchristensen> it is 5/6ths done building ...
<LnL> I'll be home in ~3h
<gchristensen> it'll only take you a minute :)
<gchristensen> I mean, it can wait
<gchristensen> but also you can probably do it from where you are without trouble
<gchristensen> ok LnL, 1) merge this PR: https://github.com/NixOS/ofborg/pull/211
<{^_^}> ofborg#211 (by grahamc, 17 hours ago, open): 0.1.7 for the new result changes
<LnL> I could press a button if that's what you're saying, but I'd prefer to have a rough understanding of what happens :)
<LnL> don't want to do anything in git while at work tho
<gchristensen> yeah, sure: 1) you trigger a build in buildkite, it does a build and a `nixops deploy --dry-activate` to show you what will be deployed and restarted, and then you trigger the actual build and it does a nixops deploy
<gchristensen> its all done in buildkite and uses git crypt to decrypt the nixops state file
<gchristensen> by pressing these buttons: https://buildkite.com/ofborg/production-deployment/builds/32
<gchristensen> which you should be invited to press now
<LnL> so everything is done by nixops or are there some additional steps?
<gchristensen> before nixops runs, the log viewer and ofborg repos are updated locally for the deploy
<gchristensen> but the entire deploy process is first this script: https://github.com/ofborg/infrastructure/pull/2/files#diff-fec83b7e693a8e104fa9713533e1c054
<gchristensen> with a confirm button between the two
<LnL> what about eg. rabbitmq, since that can't restart without impacting other stuff?
<gchristensen> yeah, the nixpkgs version is pinned in the repo, so that won't restart
<gchristensen> ...unless you change it, of course, but please don't deploy those through buildkite
<LnL> right
<gchristensen> this is why there is the dry activate step before the actual deploy, to verify what is going to actually happen
<LnL> yeah, that's great
<gchristensen> ~sometime~ I'll add multiple rabbitmq nodes so we can take one down and have it be okay
<LnL> just wanted to make sure what caveats there are or if it was a separate thing
<gchristensen> yeah, thank you :)
<gchristensen> LnL: so I'll leave this merge & deploy up to you, let me know when you're ready so I can watch it happen :P
<LnL> you around in ~3h?
<gchristensen> yep
<LnL> kk
<LnL> what are you thinking in terms of a rollback workflow?
<gchristensen> yeah, I have no good story there
<gchristensen> might involve manual things, or rolling forwards via reverts in the ofborg repo
<gchristensen> because nixops basically doens't have proper state I'm not sure it supports rolling back
<gchristensen> network.enableRollback
<gchristensen> If true, each deployment creates a new profile generation to able to run nixops rollback. Defaults to false.
<gchristensen> maybe just turning that on
<LnL> what we do at work before the build enables maintenance mode and starts upgrading is display a bunch of information about the current deployment
<LnL> like the source revision, what images are running at the moment, etc.
<gchristensen> that seems cool
<LnL> and a real rollback would be manual, in practice we can redeploy old revisions just fine it the database schema didn't change
<gchristensen> yeah
orivej has joined #nixos-borg
<LnL> uh oh, it's builtins.storePath that also fixed travis?
<gchristensen> dunno
<LnL> certainly looks like it
<LnL> gchristensen: ping
<gchristensen> ready? A:D
<{^_^}> [ofborg] @LnL7 merged pull request #211 → 0.1.7 for the new result changes → https://git.io/fNyLf
<{^_^}> [ofborg] @LnL7 pushed 13 commits to released: https://git.io/fNSDH
* gchristensen takes that as a yessss
<LnL> yep :D
<gchristensen> so for deployment, one sec
<gchristensen> let me figure something out
<gchristensen> it is a bit weird, LnL, but click "Rebuild" here: https://buildkite.com/ofborg/production-deployment/builds/32
<LnL> oh, hmm
<gchristensen> hmm?
<LnL> ah that creates a new build
<gchristensen> yeah, so we'll watch #33 now
<{^_^}> https://github.com/NixOS/nixpkgs/pull/33 (by peti, 6 years ago, closed): Apache 2.4.x update
<gchristensen> and go ahead with the dry activation phase when you're ready
<LnL> I see, is this because it's not ofborg but the infra repo?
<gchristensen> yeah
<gchristensen> maybe can have deploys be triggered by updating commit refs in the infra repo
<LnL> yeah, I noticed the 66df103 instead of the commit I was expecting
<gchristensen> huh, very weird
<LnL> because it's the infra repo
<gchristensen> yeah
<gchristensen> the weird thing though is it isn't going to update ofborg
<gchristensen> looking ...
<LnL> Already on 'up/rem'?
<gchristensen> yeah, I tried to be cute about branch names :|
<LnL> I mean why it's not updating
<LnL> it fetched correctly but it looks like checkout didn't do anything
<gchristensen> yeah
<gchristensen> I'm making it less cute, one sec
<gchristensen> :/
<gchristensen> I might need to clean up the build directory -- I might have messed it up with my previous attempt :)
<gchristensen> ok yeah I just cleaned up the dirs and restarted dry activation here: https://buildkite.com/ofborg/production-deployment/builds/35
<LnL> remote is a commit not a branch
<gchristensen> remote is a md5sum of the remote URL:)
<LnL> ah
<LnL> yeah
<gchristensen> that all looks okay to me
<gchristensen> go ahead and deploy!
<gchristensen> http://butt.holdings
<LnL> sorry, had to throw some stuff in my rice before it's ready :)
<gchristensen> its okay :)
<LnL> yeah looks good now
<LnL> \o/
<gchristensen> hot dog!
<LnL> was already staring at that :p
<gchristensen> :D
<LnL> well look at that
<gchristensen> nice work!
<gchristensen> I hereby grant you permission to merge permission request PRs and deploy them
<LnL> so, can we use a test pr to verify the timeout stuff?
<gchristensen> sure
<LnL> or do we just keep an eye on nixpkgs
<gchristensen> oh, hmm it is tough to ensure an updated builder gets the request
<LnL> yeah, ideally only the test pr would go there
<LnL> or are most of the linux builders also updated by this
<gchristensen> most are not updated by this
<{^_^}> #44722 (by volth, 2 hours ago, open): processing: 3.3.7 -> 3.4
<gchristensen> oh wow
<gchristensen> nothing for 2 hours
<LnL> wait that's from 2h ago
* gchristensen goes looking for logs
<gchristensen> oooohh damn
<LnL> did my fake deploy do something?
<gchristensen> apparently my deploy process isn't as good as I thought :)
<gchristensen> v/Aug 08 17:14:56 core-0.ewr1.nix.ci php-fpm[4509]: [WARNING] [pool main] child 19569 said into stderr: "NOTICE: PHP message: PHP Warning: require_once(/nix/store/qrkb73xmxkc54rj9sdcrf3c9ia174ns5-configured-webhook/vendor/autoload.php): failed to open stream: No such file or directory in /nix/store/qrkb73xmxkc54rj9sdcrf3c9ia174ns5-configured-webhook/config.php on line 3"
<gchristensen> fixing ...
<LnL> that's the webhook?
<gchristensen> yeah
<{^_^}> [ofborg] @grahamc pushed to package-vendor « Package up dependencies with composer2nix »: https://git.io/fNS5v
<{^_^}> [ofborg] @grahamc opened pull request #213 → Package up dependencies with composer2nix → https://git.io/fNS5t
<LnL> that was about the last think I expected to break
<gchristensen> yeah.....
<gchristensen> it is exposing the lazy things I did to get stuff out the door :P
<gchristensen> ok so a test failed on the nix2 build but we're going to merge anyway since it is a flaky test and let's get this out.
<{^_^}> [ofborg] @grahamc merged pull request #213 → Package up dependencies with composer2nix → https://git.io/fNS5t
<{^_^}> [ofborg] @grahamc pushed 2 commits to released: https://git.io/fNS5x
<{^_^}> [ofborg] @grahamc pushed 0 commits to package-vendor: https://git.io/fNS5p
<LnL> hmm?
<LnL> you have tests that fail sometimes?
<gchristensen> apparently :)
<gchristensen> okay I'm going to Deploy
<LnL> alright, I'll spam some people then
<gchristensen> one sec LnL :)
<gchristensen> I might have a secret way to do it ...
<LnL> ah :)
<gchristensen> looking now ...
<gchristensen> I'd like to have monitoring set up for PRs which are not yet evaluate
<gchristensen> I'd like to have monitoring set up for PRs which are not yet evaluated
<gchristensen> and page if it goes above ?
<LnL> yeah, was just going to say the same
<LnL> did you also setup an alertmanager
<gchristensen> I think so
<gchristensen> looking ...
<gchristensen> yeah
<LnL> oh, and I lost my access to grafana since the migration away from nix.gsc.io
<gchristensen> all the data is public now
<gchristensen> I think?
<LnL> yeah, there are no other dashboards?
<gchristensen> oh
<gchristensen> let me make you an account
<gchristensen> LnL: PM'd you an invite, you can even fix your name that I typo'd
<LnL> thanks
<LnL> oh it is the only one
<LnL> am I misremembering there being more?
<gchristensen> there might have been more, I don't remember
<LnL> anyway, if I ever figure out how the rust library works I might add some extra stuff :)
<gchristensen> :D
<gchristensen> we can ditch it
<gchristensen> it isn't very good
<LnL> last time I looked at the metrics part I got super confused
<gchristensen> eah :|
<gchristensen> ok let's spam PRs
<LnL> does that also trigger builds for valid commit messages?
<gchristensen> yeah
<LnL> nice
<gchristensen> only if the author is authorized of course
<LnL> :)
<gchristensen> nice
FRidh has quit [Ping timeout: 244 seconds]
<gchristensen> ok pushover is setup again to alert me
orivej has quit [Ping timeout: 240 seconds]
<LnL> I also have pushover account
<gchristensen> want to be added? it isn't exactly fun :P
<LnL> yeah
<gchristensen> ok, I think I need your user key and a token?
<gchristensen> or maybe I reuse the token
<gchristensen> probably just your user key
<gchristensen> https://prometheus.io/docs/alerting/configuration/#%3Cwebhook_config%3 it'd be neat to hook this up to {^_^}
<LnL> oh yeah
<LnL> so my user key dor do I need to create a new application/api token?
<gchristensen> just your user key
<gchristensen> LnL: send it over and I'll hook you up
NinjaTrappeur has left #nixos-borg ["WeeChat 2.2"]
<LnL> have you used vault before?
<gchristensen> https://nix.ci/status
<gchristensen> I have, but not in a while
<LnL> nice
<gchristensen> ok triggering an error to see ify ou receive it
<LnL> nope?
<gchristensen> ehhh I did it wrong :P
<gchristensen> I don't think I can "just" trigger an alert so I took down the aarch64 builders to cause that to alert
<gchristensen> but they came back up automatically and then interrupted the alert process moments before it triggered
<LnL> haha
<gchristensen> so we should be getting one here in about 30s
<LnL> yup
<gchristensen> you got one?
<gchristensen> great
<gchristensen> ok letting the aarch64 builders come back up
<LnL> oh neat, it snoozes until you acknowledge
<gchristensen> LnL: https://nix.ci/
<gchristensen> I'm not accidentally leaking secret stuff here am I?
<LnL> well, depends if there's anything sensitive in your metrics
<gchristensen> those are all public already due to grafana
<LnL> and there's stuff like the scraping targets that might be publicly accessible
<gchristensen> ehhh I feel fine about that, too. should I not be?
<LnL> don't think so, given that this is all public infra
<gchristensen> cool
<gchristensen> lol and now that it is supposed to be up, it stays down
<samueldr> sorry to butt-in with almost irrelevant thing, but I think it's a mis-step to put it at the naked root url, neither under a subdomain or a root folder... in case you want to eventually put something like an explanation of the CI things, or even new tools no-one thought about yet
<gchristensen> I agree
<gchristensen> do you think it is wrong to move it later?
<samueldr> depends if there are things using the URLs, people aren't things :)
<samueldr> though people could be surprised and that's just unfortunate
<gchristensen> until about 30min ago nix.ci was a 404 :D
<samueldr> yes I know, I was thinking that a page like cache.nixos.org would be a fine addition
<gchristensen> I'd love a PR adding that!
<LnL> yeah, that sounds great
<samueldr> good work pointing the exact location to start doing that :)
<gchristensen> :)
<gchristensen> I wish I could more easily trigger an alert from prometheus
<LnL> Can't start up: not enough memory
<gchristensen> huh?
<LnL> https://logs.nix.ci/?key=nixos/nixpkgs.44748&attempt_id=768357f2-79b8-4813-a823-6ca27b87eda0
<gchristensen> fuck :/
<gchristensen> I don't know what to do.
<LnL> you changed the heap size right, or was that only for the evaluator
<LnL> looks like 10G
<gchristensen> ack
<gchristensen> I think I didn't mean to do thaht
<gchristensen> so much mess here was cleaned up in the last couple days
<LnL> that makes me wonder
<LnL> what happens if I put a syntax error in one of the config.json files?
<gchristensen> that'll fail, nix parses it on depoly :)
<gchristensen> before deploy*
<gchristensen> can you send a PR fixing that value?
<LnL> ah it goes through nix
<LnL> good :D
<gchristensen> nix merges several config files :)
<LnL> yeah, but wasn't sure how
<gchristensen> a new page is about to come through, testing an updated "source" URL
<gchristensen> feels good to be able to improve this again
orivej has joined #nixos-borg
<gchristensen> LnL: w00t the "Source" link in the alert you just got points to an actual graph now
<LnL> oh!
<gchristensen> LnL (about the 10g patch), samueldr (about the homepage patch): ETAs on these? to help me schedule my next few minutes
<LnL> hmm?
<gchristensen> oh, I thought you were going to revert the 10g initial heap commit
<LnL> I can, thought you where already on it
<gchristensen> please do that for me
<LnL> sure
<gchristensen> and then deploy it :D
<samueldr> gchristensen: tonight, still working, I was peeping while gitlab was hanging the whole tab process
<gchristensen> cool :)
<{^_^}> [ofborg] @LnL7 pushed to released « Revert "10g heap size" »: https://git.io/fN9td
<gchristensen> I had half a thought to expose alertmanager's web UI but I think that is not a good idea
<LnL> so what's the correct workflow here, rebuild the last infra pr?
<gchristensen> yeah, a wonderful question :|
<gchristensen> for now, let's try:
<gchristensen> https://buildkite.com/ofborg/production-deployment -> click New build, and then Create Build
<gchristensen> lgtm
<gchristensen> except for some hiccups along the way here, I'd say this is pretty nice
<LnL> yeah, something going wrong the first time is to be expected
<LnL> but it went pretty smooth otherwise
<LnL> could you disable the other darwin builder for a moment
<gchristensen> yeah, one moment
<gchristensen> stopped
<LnL> hrm, how is the queue order determined?
<gchristensen> I think it is pretty much FIFO?
<LnL> but my build is more important :p
<gchristensen> :D
<gchristensen> I thought about that being a paid perk
<samueldr> spot auction priorities?
<gchristensen> haha yeah
<LnL> ah there we go
<gchristensen> samueldr: I set up a cute landing page you can throw away: https://nix.ci//
<LnL> you can start it again
<samueldr> gchristensen: interested more into branding nix.ci or ofborg?
<samueldr> (it's your call after all)
<gchristensen> what do you think would be best? https://github.com/ofborg/infrastructure/blob/master/website/index.html here is the source, complete with at least one HTML error
<samueldr> ofborg has the issue of using the borg name and intellectual property of Paramount/CBS(?)
<samueldr> which probably is not an issue
<samueldr> but something to keep in mind
<samueldr> it also could cause contention with borg-backup and the other borg at google with its name
<gchristensen> right
<gchristensen> can I defer this decision to you?
<samueldr> uh, yeah, though it feels weird to decide that
<gchristensen> great :)
cransom has quit [*.net *.split]
<gchristensen> yay! *updates my update*
<LnL> so all builds will report timeouts once the aarch64 + 2 remaining builders are upgraded
<gchristensen> building the aarch64 image now :)
<LnL> (I assume those are all of the instances still on 0.1.5)
<gchristensen> yea