<aminechikhaoui>
I can add a PR to support --repair for nix copy, I just don't see the original issue that made Eelco hard code NoRepair in the first place. So if someone knows an existent issue that prevent this it would be helpful
<gchristensen>
aminechikhaoui: if I had to guess, it is AWS's 11-9s guarantee on durability
vdemeester` has quit [Changing host]
vdemeester` has joined #nixos-dev
<aminechikhaoui>
gchristensen: you mean as the root cause of the path being corrupted ?
<gchristensen>
as being a reason to assume it isn't corrupted
<aminechikhaoui>
gchristensen: yeah, I really don't know how I ended up with this corrupted path :D
<gchristensen>
:D
<aminechikhaoui>
alright I guess I'll implement the repair thingy and see if It will break my binary cache even further :p
<dtz>
:3 good luck
<aminechikhaoui>
hm also I can try a super evil thing and remove the objects from the bucket then copy the path again :D
<aminechikhaoui>
oh someone at work pointed out that it could have happened during the upload
<aminechikhaoui>
which makes more sense, I don't think we do a verify after the addToStore
<ikwildrpepper>
anyone has an idea why nix-shell would pull in all 'outputs' for aws-sdk-cpp when I only add aws-sdk-cpp.dev to buildInputs of my build?
<ikwildrpepper>
aws-sdk-cpp is about 783MB download, so that is not so nice :D
<ikwildrpepper>
(aws-sdk-cpp.debug, I mean)
<aminechikhaoui>
ikwildrpepper: do you have environment.enableDebugInfo set to true by any chance ?
<ikwildrpepper>
nope
Sonarpulse has joined #nixos-dev
jtojnar has quit [Quit: jtojnar]
ma27 has quit [Ping timeout: 265 seconds]
tilpner has joined #nixos-dev
drakonis has joined #nixos-dev
ma27 has joined #nixos-dev
drakonis_ has joined #nixos-dev
<Dezgeg>
anyone capable of aborting staging jobs?
drakonis_ has quit [Quit: Leaving]
Sonarpulse has quit [Ping timeout: 276 seconds]
<LnL>
what eval?
Taneb has joined #nixos-dev
<Dezgeg>
all current ones I think
<Dezgeg>
or maybe one of the older ones is salvageable, hm
<Dezgeg>
they are all at 6k failures though, so dunno...
<LnL>
done, unless the current one is also bad
xeji has joined #nixos-dev
<xeji>
ofborg builder for x86_64 has been stalled for a while - please restart (whoever can do this)
<xeji>
*x86_64-linux
drakonis has quit [Remote host closed the connection]
<LnL>
aren't there multiple?
andi- has quit [Ping timeout: 264 seconds]
andi- has joined #nixos-dev
Sonarpulse has joined #nixos-dev
<xeji>
LnL: grafana dashboard shows 80 waiting and zero current jobs. Looks like nothing is getting done
<xeji>
and the queue just keeps growing
vcunat has joined #nixos-dev
<srhb>
Regarding chromium, it looks (hydra/src/hydra-eval-jobs/hydra-eval-jobs.cc) like hydra might actually query the derivation for timeout values. We don't seem to be using this anywhere in nixpkgs. Does anyone know if it actually works?
<xeji>
cool. looks like meta.timeout with a 10hr default
<vcunat>
oh!
<vcunat>
Then we just increase it for chromium and we're done?
<srhb>
That's what it looks like to me. Testing locally now.
<vcunat>
Testing lowering it, I hope.
<srhb>
yes. :P
<vcunat>
Clever :-)
<vcunat>
(ah, I didn't mean to ping that guy)
* clever
waves
Sonarpulse has quit [Ping timeout: 276 seconds]
contrapumpkin has joined #nixos-dev
__Sander__ has quit [Quit: Konversation terminated!]
<srhb>
It certainly sets the value in the builds table to meta.timeout
<vcunat>
You can try with something extremely short.
<dtz>
is that expected / known?
<vcunat>
(I suppose)
<srhb>
vcunat: Yes, I have two seconds set (for the build attribute of Nix)
<srhb>
Or well, two "arbitrary unit"
<vcunat>
dtz: I do know it
<vcunat>
it's happening for months commonly
<vcunat>
srhb: so you should see if the build times out or not, right?
<dtz>
okay no worries
<srhb>
vcunat: Depends how often Hydra checks if something is timed out.
<srhb>
Up to five minutes now.
<xeji>
dtz: some days, only about 1 in 3 or 4 evals works. I've also freqently seen sqlite db errors
<vcunat>
hmm, five minutes seems suspicious, but we'll see
<vcunat>
(I've never really looked into hydra codebase)
Sonarpulse has joined #nixos-dev
MichaelRaskin has joined #nixos-dev
<xeji>
ofborg finally started building x86_64-linux again - thanks to whoever helped!
<srhb>
vcunat: It looks right in the database but I can't see Hydra killing anything that exceeds my (low) limits. Meh.
<xeji>
shrb: there seems to be a retry mechanism after failed build steps in the hydra code
<dtz>
haha loving how borg is catching back up with the PR's
<xeji>
after they're merged...
<vcunat>
shlevy: well, we can just try it, there's nothing to lose really
<vcunat>
srhb: ^^ (I'm sorry, Shea)
<srhb>
vcunat: Agreed.
<vcunat>
So, you push it?
<MichaelRaskin>
Wait, is my brix-on-the-table (which is _expected_ to be down a lot of time — it is literally on a table in the apartment!) the only build machine for ofborg amd64??
<srhb>
vcunat: Can't click any buttons right now. Can do later :)
<vcunat>
:-)
<xeji>
MichaelRaskin: well, no builder was active for about a day
<MichaelRaskin>
Sounds about how long my builder was cleanly down
<vcunat>
I plan to get some boxes usable around nixos.org
<vcunat>
(older office desktops)
<MichaelRaskin>
(I did intentionally shutdown the machine thinking that there is _some_ constant build capacity — when I requested credentials I asked if an often-down builder is useful; I asked for a reason)
zybell_ has quit [Ping timeout: 268 seconds]
<MichaelRaskin>
Well, 4GiB and any i5 is already quite a bit of power. A couple of such machines should be able to clean queue to zero from time to time.
zybell has joined #nixos-dev
<vcunat>
Yeah, i5 with 4-8 GiB RAM and 0.5TB HDD.
<vcunat>
(that's what I'm considering)
<vcunat>
Only four threads per box, that's the biggest downside.
ma27 has quit [Quit: WeeChat 2.0]
ma27 has joined #nixos-dev
ma27 has quit [Client Quit]
<vcunat>
(If I had to by a new one, it would be 16-threaded ryzen, instead of a couple of these.)
<xeji>
MichaelRaskin: looking at low quickly the queue goes down now, something like your box is more than enough
<LnL>
most builds that currently run on ofborg are pretty short
<MichaelRaskin>
If we ever want to have enough capacity for Chromium checks (hides)
<xeji>
I wonder where the 31 aarch64 builders come from
<vcunat>
I know that.
<LnL>
the community box
<vcunat>
yeah, probably nothing else
<MichaelRaskin>
vcunat: 2 quad-threads with 8GiB RAM each should fully replace my builder.
<MichaelRaskin>
In the bright case, we have extra capacity and can request more buildes
<MichaelRaskin>
In the bad case, still something is up
<xeji>
hey, let's build chromium on aarch64
<MichaelRaskin>
Looking at the queue quickly going down, I would say that even the weakest builder could get stuff from building up with the current kind of things get to ofborg
<MichaelRaskin>
Oh well, except Chromium _can_ get auto-sent to builders…
<vcunat>
I'm thinking of making hydra.nixos.org slaves, too. It seems any amount gets utilized.
<vcunat>
I'll see in time.
<xeji>
I opened a qemu PR last night, that's still in the queue
pie_ has joined #nixos-dev
<MichaelRaskin>
Qemu is quite big, I think
<vcunat>
you mean, build x86 chromium on that aarch64 box?
<vcunat>
That won't be efficient.
<vcunat>
(I suspect.)
<xeji>
just kidding
<xeji>
MichaelRaskin: qemu takes about 30min on my local i5 box
<vcunat>
:-)
<MichaelRaskin>
That might even time out with my aggressive grabbing everything to get rid of small stuff in the backlog…
<xeji>
yeah, just kill it.
<MichaelRaskin>
Well, might get lucky
<MichaelRaskin>
Killing it is also work
<xeji>
lol
<MichaelRaskin>
Ah, found it…
<MichaelRaskin>
Killed it, got it back after restarting the thread
<xeji>
... well I pushed 3 commits , so there may be 3 jobs
<MichaelRaskin>
Nope, that's restart of the lost builds
<xeji>
(I mean pushed separately)
<MichaelRaskin>
OK, _failed_ the job by chmod 0 -R
<vcunat>
srhb: push what seemed to work to master and 18.03 when you can, please :-)
<vcunat>
Otherwise I'll try to come with something over the weekend.
<vcunat>
(if this doesn't work)
<srhb>
vcunat: Will do. :)
ma27 has joined #nixos-dev
ma27 has quit [Client Quit]
ma27 has joined #nixos-dev
ma27 has quit [Client Quit]
ma27 has joined #nixos-dev
ma27 has quit [Client Quit]
ma27 has joined #nixos-dev
zybell has quit [Ping timeout: 256 seconds]
<srhb>
vcunat: Done.
<vcunat>
Triggered 18.03 evaluation to see the result in the morning.
<srhb>
vcunat: Cool! :0
<vcunat>
we need to unblock the uefi installer problem
<samueldr>
and nix 2.0.1 (memory issues)
<srhb>
when it's cooking it would be nice to select timeout from builds where id = $buildid
<srhb>
And yes, we definitely need to unblock. Sad timing for this issue to crop up. :(
<srhb>
Oh no, does meta not count towards a new evaluation...