00:36
orivej has quit [Ping timeout: 244 seconds]
00:48
kalbasit has quit [Quit: WeeChat 2.1]
03:09
jtojnar has quit [Remote host closed the connection]
03:11
jtojnar has joined #nixos-borg
05:27
FRidh has joined #nixos-borg
07:44
<
LnL >
gchristensen: I'm not sure what's causing the test failure, the only issue I can reproduce is the <nix/config.nix> + sandboxing problem
09:47
orivej has joined #nixos-borg
10:52
<
gchristensen >
hrm :/
10:52
<
gchristensen >
annoying :)
11:01
<
LnL >
it does sound kind of familiar, don't remember what I ran into when adding the initial travis build
11:02
<
LnL >
btw, do you think it would be ok to make the tests depend on nixpkgs?
11:03
<
LnL >
the host nix stuff has some problems
11:07
orivej has quit [Ping timeout: 256 seconds]
11:16
NinjaTrappeur has quit [Quit: WeeChat 2.2]
11:19
NinjaTrappeur has joined #nixos-borg
11:31
timokau has joined #nixos-borg
11:43
timokau[m] has joined #nixos-borg
12:10
<
gchristensen >
oh should we change that?
12:17
<
gchristensen >
ideally we wouldn't depend upon nixpkgs but if it makes things a lot easier, sure
12:21
<
gchristensen >
where "a lot" is however much you want that to mean
12:48
<
LnL >
the problem is that it's brittle in combination with nix-daemon + sandboxing
12:48
<
LnL >
and depending on the system shell in <nix/config.nix> might not even be in the store
12:49
<
gchristensen >
oh right
12:50
<
gchristensen >
ok sure let's do that
12:55
<
gchristensen >
I wish travis was faster
12:56
<
LnL >
eg. stuff like this can happen if nix-daemon / client don't match exactly.
12:56
<
LnL >
while setting up the build environment: executing '/nix/store/zqh3l3lyw32q1ayb15bnvg9f24j5v2p0-bash-4.4-p12/bin/bash': No such file or directory
12:58
<
LnL >
when bash in <nix/config.nix> of the client is different from the one the nix-daemon knows about, so it's not allowed in the sandbox
12:58
<
gchristensen >
ouch
12:58
<
{^_^} >
[ofborg] @grahamc pushed to bump-0.1.7 « Split out the ofborg builds in to separate jobs »:
https://git.io/fNSZc
12:59
<
gchristensen >
that seems weird, it isn't just noted as a build input or whatever?
13:01
<
LnL >
no it's a string
13:01
<
LnL >
you'd need to use builtins.storePath cfg.shell, but that only works if it's actually in the store
13:02
<
gchristensen >
ehhh
13:02
<
gchristensen >
let's start with that
13:02
<
gchristensen >
to get things passing and then we can make it better and move to nixpkgs
13:02
<
LnL >
oh, the tests don't do that...
13:04
<
gchristensen >
I'm adding that now
13:04
<
LnL >
yeah just realised by talking about it
13:04
<
LnL >
might break testing on darwin tho
13:04
<
gchristensen >
I was wondering why the heck it wasn't finding it, heh
13:05
<
LnL >
this is not related to travis btw, since those builds use a single user install
13:05
<
gchristensen >
yeah, but they were failing here :)
13:05
<
gchristensen >
and now they pass
13:07
<
gchristensen >
ah right
13:07
<
gchristensen >
look at all these assumptions
13:08
<
LnL >
until now it worked for me because of build-remote, but I think builtins.storePath breaks that
13:09
<
gchristensen >
how does nixpkgs do this stuff?
13:09
<
gchristensen >
(I don't actually need to know, we can just move to using nixpkgS!)
13:10
<
LnL >
nixpkgs only uses this stuff to unpack bootstrap tools
13:10
<
gchristensen >
which doesn't happen remotely?
13:11
<
Dezgeg >
i don't think nixpkgs uses it, it uses statically linked busybo
13:11
<
gchristensen >
right
13:11
<
LnL >
it almost never happens because it's cached and doesn't change
13:11
<
gchristensen >
yeah
13:13
<
gchristensen >
so let's do it, I don't see a real disadvantage
13:14
<
LnL >
anyway, let's do builtins.storePath for now I can fix the platforms and/or use nixpkgs if necessary
13:16
<
gchristensen >
LnL: want to deploy this PR once it merges? (for 0.1.7)
13:17
<
gchristensen >
I mean, if you don't want to =)
13:18
<
LnL >
no let's, not sure what I'd have to do tho
13:19
<
gchristensen >
it is 5/6ths done building ...
13:20
<
LnL >
I'll be home in ~3h
13:20
<
gchristensen >
it'll only take you a minute :)
13:20
<
gchristensen >
I mean, it can wait
13:21
<
gchristensen >
but also you can probably do it from where you are without trouble
13:25
<
{^_^} >
ofborg#211 (by grahamc, 17 hours ago, open): 0.1.7 for the new result changes
13:25
<
LnL >
I could press a button if that's what you're saying, but I'd prefer to have a rough understanding of what happens :)
13:26
<
LnL >
don't want to do anything in git while at work tho
13:26
<
gchristensen >
yeah, sure: 1) you trigger a build in buildkite, it does a build and a `nixops deploy --dry-activate` to show you what will be deployed and restarted, and then you trigger the actual build and it does a nixops deploy
13:27
<
gchristensen >
its all done in buildkite and uses git crypt to decrypt the nixops state file
13:29
<
gchristensen >
which you should be invited to press now
13:31
<
LnL >
so everything is done by nixops or are there some additional steps?
13:31
<
gchristensen >
before nixops runs, the log viewer and ofborg repos are updated locally for the deploy
13:32
<
gchristensen >
with a confirm button between the two
13:34
<
LnL >
what about eg. rabbitmq, since that can't restart without impacting other stuff?
13:35
<
gchristensen >
yeah, the nixpkgs version is pinned in the repo, so that won't restart
13:36
<
gchristensen >
...unless you change it, of course, but please don't deploy those through buildkite
13:36
<
gchristensen >
this is why there is the dry activate step before the actual deploy, to verify what is going to actually happen
13:37
<
LnL >
yeah, that's great
13:37
<
gchristensen >
~sometime~ I'll add multiple rabbitmq nodes so we can take one down and have it be okay
13:38
<
LnL >
just wanted to make sure what caveats there are or if it was a separate thing
13:38
<
gchristensen >
yeah, thank you :)
13:40
<
gchristensen >
LnL: so I'll leave this merge & deploy up to you, let me know when you're ready so I can watch it happen :P
13:40
<
LnL >
you around in ~3h?
13:44
<
LnL >
what are you thinking in terms of a rollback workflow?
13:46
<
gchristensen >
yeah, I have no good story there
13:47
<
gchristensen >
might involve manual things, or rolling forwards via reverts in the ofborg repo
13:48
<
gchristensen >
because nixops basically doens't have proper state I'm not sure it supports rolling back
13:48
<
gchristensen >
network.enableRollback
13:48
<
gchristensen >
If true, each deployment creates a new profile generation to able to run nixops rollback. Defaults to false.
13:48
<
gchristensen >
maybe just turning that on
13:52
<
LnL >
what we do at work before the build enables maintenance mode and starts upgrading is display a bunch of information about the current deployment
13:53
<
LnL >
like the source revision, what images are running at the moment, etc.
13:56
<
gchristensen >
that seems cool
13:59
<
LnL >
and a real rollback would be manual, in practice we can redeploy old revisions just fine it the database schema didn't change
14:02
<
gchristensen >
yeah
14:18
orivej has joined #nixos-borg
15:08
<
LnL >
uh oh, it's builtins.storePath that also fixed travis?
15:10
<
gchristensen >
dunno
15:24
<
LnL >
certainly looks like it
16:38
<
LnL >
gchristensen: ping
16:38
<
gchristensen >
ready? A:D
16:40
* gchristensen
takes that as a yessss
16:41
<
gchristensen >
so for deployment, one sec
16:41
<
gchristensen >
let me figure something out
16:43
<
gchristensen >
hmm?
16:43
<
LnL >
ah that creates a new build
16:43
<
gchristensen >
yeah, so we'll watch #33 now
16:43
<
gchristensen >
and go ahead with the dry activation phase when you're ready
16:44
<
LnL >
I see, is this because it's not ofborg but the infra repo?
16:45
<
gchristensen >
yeah
16:45
<
gchristensen >
maybe can have deploys be triggered by updating commit refs in the infra repo
16:46
<
LnL >
yeah, I noticed the 66df103 instead of the commit I was expecting
16:46
<
gchristensen >
huh, very weird
16:47
<
LnL >
because it's the infra repo
16:47
<
gchristensen >
yeah
16:47
<
gchristensen >
the weird thing though is it isn't going to update ofborg
16:47
<
gchristensen >
looking ...
16:48
<
LnL >
Already on 'up/rem'?
16:48
<
gchristensen >
yeah, I tried to be cute about branch names :|
16:48
<
LnL >
I mean why it's not updating
16:49
<
LnL >
it fetched correctly but it looks like checkout didn't do anything
16:50
<
gchristensen >
yeah
16:50
<
gchristensen >
I'm making it less cute, one sec
16:57
<
gchristensen >
I might need to clean up the build directory -- I might have messed it up with my previous attempt :)
16:58
<
LnL >
remote is a commit not a branch
17:00
<
gchristensen >
remote is a md5sum of the remote URL:)
17:02
<
gchristensen >
that all looks okay to me
17:02
<
gchristensen >
go ahead and deploy!
17:06
<
LnL >
sorry, had to throw some stuff in my rice before it's ready :)
17:06
<
gchristensen >
its okay :)
17:06
<
LnL >
yeah looks good now
17:07
<
gchristensen >
hot dog!
17:08
<
LnL >
was already staring at that :p
17:08
<
LnL >
well look at that
17:08
<
gchristensen >
nice work!
17:09
<
gchristensen >
I hereby grant you permission to merge permission request PRs and deploy them
17:10
<
LnL >
so, can we use a test pr to verify the timeout stuff?
17:10
<
gchristensen >
sure
17:10
<
LnL >
or do we just keep an eye on nixpkgs
17:10
<
gchristensen >
oh, hmm it is tough to ensure an updated builder gets the request
17:11
<
LnL >
yeah, ideally only the test pr would go there
17:11
<
LnL >
or are most of the linux builders also updated by this
17:12
<
gchristensen >
most are not updated by this
17:13
<
{^_^} >
#44722 (by volth, 2 hours ago, open): processing: 3.3.7 -> 3.4
17:13
<
gchristensen >
oh wow
17:13
<
gchristensen >
nothing for 2 hours
17:13
<
LnL >
wait that's from 2h ago
17:13
* gchristensen
goes looking for logs
17:14
<
gchristensen >
oooohh damn
17:14
<
LnL >
did my fake deploy do something?
17:15
<
gchristensen >
apparently my deploy process isn't as good as I thought :)
17:15
<
gchristensen >
v/Aug 08 17:14:56 core-0.ewr1.nix.ci php-fpm[4509]: [WARNING] [pool main] child 19569 said into stderr: "NOTICE: PHP message: PHP Warning: require_once(/nix/store/qrkb73xmxkc54rj9sdcrf3c9ia174ns5-configured-webhook/vendor/autoload.php): failed to open stream: No such file or directory in /nix/store/qrkb73xmxkc54rj9sdcrf3c9ia174ns5-configured-webhook/config.php on line 3"
17:15
<
gchristensen >
fixing ...
17:16
<
LnL >
that's the webhook?
17:16
<
gchristensen >
yeah
17:32
<
LnL >
that was about the last think I expected to break
17:32
<
gchristensen >
yeah.....
17:32
<
gchristensen >
it is exposing the lazy things I did to get stuff out the door :P
17:35
<
gchristensen >
ok so a test failed on the nix2 build but we're going to merge anyway since it is a flaky test and let's get this out.
17:38
<
LnL >
you have tests that fail sometimes?
17:39
<
gchristensen >
apparently :)
17:39
<
gchristensen >
okay I'm going to Deploy
17:43
<
LnL >
alright, I'll spam some people then
17:43
<
gchristensen >
one sec LnL :)
17:43
<
gchristensen >
I might have a secret way to do it ...
17:43
<
gchristensen >
looking now ...
17:46
<
gchristensen >
I'd like to have monitoring set up for PRs which are not yet evaluate
17:46
<
gchristensen >
I'd like to have monitoring set up for PRs which are not yet evaluated
17:46
<
gchristensen >
and page if it goes above ?
17:47
<
LnL >
yeah, was just going to say the same
17:48
<
LnL >
did you also setup an alertmanager
17:48
<
gchristensen >
I think so
17:48
<
gchristensen >
looking ...
17:49
<
gchristensen >
yeah
17:49
<
LnL >
oh, and I lost my access to grafana since the migration away from nix.gsc.io
17:50
<
gchristensen >
all the data is public now
17:50
<
gchristensen >
I think?
17:51
<
LnL >
yeah, there are no other dashboards?
17:52
<
gchristensen >
let me make you an account
17:53
<
gchristensen >
LnL: PM'd you an invite, you can even fix your name that I typo'd
17:56
<
LnL >
oh it is the only one
17:56
<
LnL >
am I misremembering there being more?
17:56
<
gchristensen >
there might have been more, I don't remember
17:58
<
LnL >
anyway, if I ever figure out how the rust library works I might add some extra stuff :)
17:59
<
gchristensen >
we can ditch it
17:59
<
gchristensen >
it isn't very good
17:59
<
LnL >
last time I looked at the metrics part I got super confused
18:01
<
gchristensen >
eah :|
18:01
<
gchristensen >
ok let's spam PRs
18:03
<
LnL >
does that also trigger builds for valid commit messages?
18:04
<
gchristensen >
yeah
18:04
<
gchristensen >
only if the author is authorized of course
18:05
<
gchristensen >
nice
18:12
FRidh has quit [Ping timeout: 244 seconds]
18:15
<
gchristensen >
ok pushover is setup again to alert me
18:15
orivej has quit [Ping timeout: 240 seconds]
18:16
<
LnL >
I also have pushover account
18:17
<
gchristensen >
want to be added? it isn't exactly fun :P
18:18
<
gchristensen >
ok, I think I need your user key and a token?
18:18
<
gchristensen >
or maybe I reuse the token
18:19
<
gchristensen >
probably just your user key
18:20
<
LnL >
so my user key dor do I need to create a new application/api token?
18:21
<
gchristensen >
just your user key
18:34
<
gchristensen >
LnL: send it over and I'll hook you up
18:54
NinjaTrappeur has left #nixos-borg ["WeeChat 2.2"]
18:57
<
LnL >
have you used vault before?
19:01
<
gchristensen >
I have, but not in a while
19:12
<
gchristensen >
ok triggering an error to see ify ou receive it
19:17
<
gchristensen >
ehhh I did it wrong :P
19:21
<
gchristensen >
I don't think I can "just" trigger an alert so I took down the aarch64 builders to cause that to alert
19:21
<
gchristensen >
but they came back up automatically and then interrupted the alert process moments before it triggered
19:22
<
gchristensen >
so we should be getting one here in about 30s
19:23
<
gchristensen >
you got one?
19:23
<
gchristensen >
great
19:23
<
gchristensen >
ok letting the aarch64 builders come back up
19:26
<
LnL >
oh neat, it snoozes until you acknowledge
19:27
<
gchristensen >
I'm not accidentally leaking secret stuff here am I?
19:28
<
LnL >
well, depends if there's anything sensitive in your metrics
19:29
<
gchristensen >
those are all public already due to grafana
19:30
<
LnL >
and there's stuff like the scraping targets that might be publicly accessible
19:31
<
gchristensen >
ehhh I feel fine about that, too. should I not be?
19:31
<
LnL >
don't think so, given that this is all public infra
19:32
<
gchristensen >
cool
19:36
<
gchristensen >
lol and now that it is supposed to be up, it stays down
19:39
<
samueldr >
sorry to butt-in with almost irrelevant thing, but I think it's a mis-step to put it at the naked root url, neither under a subdomain or a root folder... in case you want to eventually put something like an explanation of the CI things, or even new tools no-one thought about yet
19:39
<
gchristensen >
I agree
19:40
<
gchristensen >
do you think it is wrong to move it later?
19:40
<
samueldr >
depends if there are things using the URLs, people aren't things :)
19:40
<
samueldr >
though people could be surprised and that's just unfortunate
19:42
<
gchristensen >
until about 30min ago nix.ci was a 404 :D
19:44
<
samueldr >
yes I know, I was thinking that a page like cache.nixos.org would be a fine addition
19:44
<
gchristensen >
I'd love a PR adding that!
19:44
<
LnL >
yeah, that sounds great
19:45
<
samueldr >
good work pointing the exact location to start doing that :)
19:52
<
gchristensen >
I wish I could more easily trigger an alert from prometheus
19:54
<
LnL >
Can't start up: not enough memory
19:55
<
gchristensen >
huh?
19:55
<
gchristensen >
fuck :/
19:56
<
gchristensen >
I don't know what to do.
19:56
<
LnL >
you changed the heap size right, or was that only for the evaluator
19:56
<
LnL >
looks like 10G
19:57
<
gchristensen >
I think I didn't mean to do thaht
19:57
<
gchristensen >
so much mess here was cleaned up in the last couple days
19:57
<
LnL >
that makes me wonder
19:58
<
LnL >
what happens if I put a syntax error in one of the config.json files?
19:58
<
gchristensen >
that'll fail, nix parses it on depoly :)
19:58
<
gchristensen >
before deploy*
19:58
<
gchristensen >
can you send a PR fixing that value?
19:58
<
LnL >
ah it goes through nix
19:58
<
gchristensen >
nix merges several config files :)
19:59
<
LnL >
yeah, but wasn't sure how
20:01
<
gchristensen >
a new page is about to come through, testing an updated "source" URL
20:04
<
gchristensen >
feels good to be able to improve this again
20:05
orivej has joined #nixos-borg
20:06
<
gchristensen >
LnL: w00t the "Source" link in the alert you just got points to an actual graph now
20:09
<
gchristensen >
LnL (about the 10g patch), samueldr (about the homepage patch): ETAs on these? to help me schedule my next few minutes
20:09
<
gchristensen >
oh, I thought you were going to revert the 10g initial heap commit
20:10
<
LnL >
I can, thought you where already on it
20:10
<
gchristensen >
please do that for me
20:10
<
gchristensen >
and then deploy it :D
20:12
<
samueldr >
gchristensen: tonight, still working, I was peeping while gitlab was hanging the whole tab process
20:12
<
gchristensen >
cool :)
20:19
<
gchristensen >
I had half a thought to expose alertmanager's web UI but I think that is not a good idea
20:21
<
LnL >
so what's the correct workflow here, rebuild the last infra pr?
20:22
<
gchristensen >
yeah, a wonderful question :|
20:22
<
gchristensen >
for now, let's try:
20:25
<
gchristensen >
lgtm
20:28
<
gchristensen >
except for some hiccups along the way here, I'd say this is pretty nice
20:33
<
LnL >
yeah, something going wrong the first time is to be expected
20:33
<
LnL >
but it went pretty smooth otherwise
20:35
<
LnL >
could you disable the other darwin builder for a moment
20:36
<
gchristensen >
yeah, one moment
20:37
<
gchristensen >
stopped
20:45
<
LnL >
hrm, how is the queue order determined?
20:46
<
gchristensen >
I think it is pretty much FIFO?
20:46
<
LnL >
but my build is more important :p
20:48
<
gchristensen >
I thought about that being a paid perk
20:48
<
samueldr >
spot auction priorities?
20:49
<
gchristensen >
haha yeah
20:49
<
LnL >
ah there we go
20:53
<
LnL >
you can start it again
20:53
<
samueldr >
gchristensen: interested more into branding nix.ci or ofborg?
20:54
<
samueldr >
(it's your call after all)
20:54
<
samueldr >
ofborg has the issue of using the borg name and intellectual property of Paramount/CBS(?)
20:55
<
samueldr >
which probably is not an issue
20:55
<
samueldr >
but something to keep in mind
20:55
<
samueldr >
it also could cause contention with borg-backup and the other borg at google with its name
20:55
<
gchristensen >
right
20:56
<
gchristensen >
can I defer this decision to you?
20:58
<
samueldr >
uh, yeah, though it feels weird to decide that
20:59
<
gchristensen >
great :)
23:02
cransom has quit [*.net *.split]
23:40
<
gchristensen >
yay!
*updates my update*
23:42
<
LnL >
so all builds will report timeouts once the aarch64 + 2 remaining builders are upgraded
23:42
<
gchristensen >
building the aarch64 image now :)
23:43
<
LnL >
(I assume those are all of the instances still on 0.1.5)