#nixos-dev on 2018-06-20

2018-04-19 20:31 gchristensen changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | 18.03 release managers: fpletz and vcunat | https://logs.nix.samueldr.com/nixos-dev

00:11 Sonarpulse has quit [Ping timeout: 245 seconds]

01:17 phreedom has quit [Remote host closed the connection]

01:17 phreedom has joined #nixos-dev

02:42 layus has quit [Quit: ZNC 1.6.5 - http://znc.in]

02:46 layus has joined #nixos-dev

03:25 lassulus_ has joined #nixos-dev

03:27 lassulus has quit [Ping timeout: 240 seconds]

03:27 lassulus_ is now known as lassulus

03:48 vdemeester` has quit [Ping timeout: 260 seconds]

03:48 sorear has quit [Ping timeout: 256 seconds]

03:49 gleber_ has quit [Ping timeout: 260 seconds]

03:49 cbarrett has quit [Ping timeout: 268 seconds]

03:49 elvishjerricco has quit [Ping timeout: 256 seconds]

03:49 cstrahan_ has quit [Ping timeout: 240 seconds]

03:50 terrorjack has quit [Ping timeout: 276 seconds]

03:50 mbrock_ has quit [Ping timeout: 276 seconds]

03:51 pauldub has quit [Ping timeout: 256 seconds]

03:51 taktoa[c] has quit [Ping timeout: 256 seconds]

03:51 ocharles_ has quit [Ping timeout: 256 seconds]

03:51 angerman has quit [Ping timeout: 256 seconds]

03:51 thoughtpolice has quit [Ping timeout: 276 seconds]

03:52 zimbatm has quit [Ping timeout: 256 seconds]

03:52 ghuntley has quit [Ping timeout: 276 seconds]

03:52 manveru has quit [Ping timeout: 276 seconds]

05:23 pauldub has joined #nixos-dev

05:25 ghuntley has joined #nixos-dev

05:25 sorear has joined #nixos-dev

05:26 mbrock_ has joined #nixos-dev

05:26 ocharles_ has joined #nixos-dev

05:26 elvishjerricco has joined #nixos-dev

05:28 thoughtpolice has joined #nixos-dev

05:30 cbarrett has joined #nixos-dev

05:31 gleber_ has joined #nixos-dev

05:31 cstrahan_ has joined #nixos-dev

05:33 manveru has joined #nixos-dev

05:35 angerman has joined #nixos-dev

05:37 zimbatm has joined #nixos-dev

05:39 taktoa[c] has joined #nixos-dev

05:56 elvishjerricco has quit [Ping timeout: 256 seconds]

05:56 gleber_ has quit [Ping timeout: 256 seconds]

05:57 ocharles_ has quit [Ping timeout: 256 seconds]

05:57 cbarrett has quit [Ping timeout: 276 seconds]

05:57 pauldub has quit [Ping timeout: 256 seconds]

05:57 angerman has quit [Ping timeout: 265 seconds]

05:57 sorear has quit [Ping timeout: 256 seconds]

05:57 ghuntley has quit [Ping timeout: 256 seconds]

05:58 taktoa[c] has quit [Ping timeout: 265 seconds]

05:58 zimbatm has quit [Ping timeout: 276 seconds]

05:58 cstrahan_ has quit [Ping timeout: 256 seconds]

05:58 thoughtpolice has quit [Ping timeout: 256 seconds]

05:58 mbrock_ has quit [Ping timeout: 256 seconds]

05:58 manveru has quit [Ping timeout: 265 seconds]

06:14 terrorjack has joined #nixos-dev

06:18 pie_ has quit [Read error: Connection reset by peer]

06:19 pie_ has joined #nixos-dev

06:20 thoughtpolice has joined #nixos-dev

06:20 mbrock_ has joined #nixos-dev

06:21 MichaelRaskin has quit [Quit: MichaelRaskin]

06:21 zimbatm has joined #nixos-dev

06:21 manveru has joined #nixos-dev

06:21 ghuntley has joined #nixos-dev

06:21 vdemeester` has joined #nixos-dev

06:21 taktoa[c] has joined #nixos-dev

06:21 ocharles_ has joined #nixos-dev

06:22 angerman has joined #nixos-dev

06:22 ocharles_ has quit [Max SendQ exceeded]

06:22 sorear has joined #nixos-dev

06:23 cstrahan_ has joined #nixos-dev

06:23 cbarrett has joined #nixos-dev

06:24 ocharles_ has joined #nixos-dev

06:24 elvishjerricco has joined #nixos-dev

06:24 pauldub has joined #nixos-dev

06:24 cstrahan_ has quit [Max SendQ exceeded]

06:25 gleber_ has joined #nixos-dev

06:25 cstrahan_ has joined #nixos-dev

06:39 vdemeester` has quit [Changing host]

06:39 vdemeester` has joined #nixos-dev

07:22 jtojnar has quit [Remote host closed the connection]

07:35 <peti> It feels like "master" is in a really bad state these days. Libreoffice has been fixed and immediately broken again at least 3 times in the last month or so. I haven't been able to compile it for well over a month. Channel updates for nixos-unstable also come around only very rarely. We had to wait 2+ weeks on the last update, and that one is now 9 old again, too, with no end in sight as plenty of Hydra

07:35 <peti> tests are still failing.

07:35 Synthetica has joined #nixos-dev

07:38 * peti wonders whether there are changes we can make to improve that situation.

07:46 jtojnar has joined #nixos-dev

07:58 <steveeJ> peti: does every PR trigger a rebuild of *all* derivations which depend on the changed code?

08:12 <peti> steveeJ: No. That's too expensive, I suppose.

08:12 <steveeJ> peti: how expensive is it to git-bisect all the time? :D

08:13 <steveeJ> peti: something in between would be to define a list of derivations which are required to be in a working state

08:14 <steveeJ> of course such a list is highly subjective but it'll be worth the effort

08:15 <steveeJ> the effort of negotiating such a list I mean

08:17 <peti> We have such a list: https://hydra.nixos.org/job/nixos/trunk-combined/tested#tabs-constituents. But it's not verified before people push their updates.

08:22 __Sander__ has joined #nixos-dev

08:28 orivej_ has joined #nixos-dev

08:29 orivej has quit [Ping timeout: 268 seconds]

08:33 <steveeJ> peti: I see, then requiring these to build and test successfully before every merge to master would do it

08:36 <srhb> steveeJ: That's not a realistic solution though.

08:37 <srhb> Not without building the actual PRs on hydra itself...

08:38 <Mic92> Sometimes I had builds that run on my machine, but broke on hydra.

08:39 <steveeJ> srhb: is building them on hydra not an option?

08:41 <Mic92> Hydra can build pull requests as far as I know, but this is not enabled for nixpkgs for some reason.

08:42 <Mic92> We could probably increase the package quality of some core packages, by allowing less other packages, that add up to much time on hydra.

08:43 <steveeJ> I don't know too much about the available infrastructure but I've seen people trigger a bot to run tests. to prevent too much load on hydra or the bot who would run the above mentioned tests, it could be triggered only on PR approval

08:44 <Mic92> The bots usually only tests the packages itself, not all its dependencies.

08:45 <steveeJ> github has an event for this: https://developer.github.com/v3/activity/events/types/#pullrequestreviewevent

08:46 <steveeJ> Mic92: which is why master is broken so often which peti would like to change :D

08:46 <Mic92> mind writing perl?

08:46 <steveeJ> very much so..

08:46 <Mic92> me neither

08:47 <steveeJ> dang language barriers

08:47 <Mic92> also bots timeout after an hour for good reason: https://github.com/NixOS/nixpkgs/pull/42288#issuecomment-398597400

08:48 <Mic92> I usually allways build all dependencies on my machine for an update.

08:48 <steveeJ> Mic92: a per-job timeout is somewhat primitive. the timeout could be dynamic and measure pkg build times

08:48 <Mic92> But for mass-rebuilds this is not feasiable

08:49 <Mic92> steveeJ: https://github.com/NixOS/ofborg feel free

08:50 <steveeJ> Mic92: btw, why did you mention perl?

08:50 <Mic92> steveeJ: hydra is written in it, ofborg is mostly rust

08:51 <steveeJ> let's rewrite hydra in Rust then :D

08:51 <Synthetica> Rewrite all the things in rust!

08:52 <steveeJ> is the a relation between ofborg and hydra?

08:52 <Mic92> steveeJ: I think there was a budget to rewrite hydra and this one: https://github.com/hercules-ci/hercules

08:53 <Mic92> ofborg was meant to replace our travis infrastructure that we had to test pull requests before

08:53 <Synthetica> Budget from whom?

08:53 <Mic92> I don't remember

09:14 jtojnar has quit [Ping timeout: 268 seconds]

09:28 vcunat has joined #nixos-dev

09:28 orivej_ has quit [Ping timeout: 240 seconds]

09:41 vcunat has quit [Ping timeout: 256 seconds]

09:44 orivej has joined #nixos-dev

09:46 jtojnar has joined #nixos-dev

09:59 vcunat has joined #nixos-dev

10:01 jtojnar has quit [Ping timeout: 260 seconds]

10:04 orivej has quit [Ping timeout: 245 seconds]

10:04 orivej has joined #nixos-dev

10:17 orivej has quit [Ping timeout: 264 seconds]

10:29 taktoa has joined #nixos-dev

10:49 orivej has joined #nixos-dev

10:50 Sonarpulse has joined #nixos-dev

11:10 <gchristensen> peti: I have an idea, let's turn off r-ryantm for a while.

11:11 <gchristensen> reduce the churn and let people who are focusing on build problems actually get something done

11:14 <peti> Yes, that might help indeed. We could also batch a few dozen r-ryantm updates in a separate branch and merge them to master only *after* Hydra says they don't cause any harm. That is what I've been doing via 'haskell-updates' for a long time.

11:14 <gchristensen> yeah, ok I'll reach out to ryan

11:15 <Sonarpulse> 1 step closer to build all PRs first :D

11:16 <LnL> l like that idea, those are all 'trivial' updates

11:17 <LnL> so we can branch off the last trunk eval into package-updates and put a bunch of stuff in there, then compare to see if there are problems

11:17 <Sonarpulse> LnL: in general I'd like to separate pkg updates from refactors

11:17 <Sonarpulse> I mentioned this from staging

11:18 <Sonarpulse> but I guess it applies to more things too

11:18 <Sonarpulse> *about staging

11:18 * peti can set up a Hydra jobset for ryantm. The name "r-ryantm-updates" comes to mind. :-) We can point it to his personal fork, too, in case he wants to do some crazy commit re-ordering and history editing, like I routinely do.

11:18 <LnL> ack

11:20 <peti> Generally speaking, though, we should consider the role of our master branch, though. We've had only 1 channel update in the last 8 weeks or so. I don't think that's a good idea. People who follow "nixos-unstable" assume that they'll get security updates quicker than aynone else, but in fact it might be *weeks* until a security-related git commit actually reaches the channel.

11:21 <gchristensen> I have told everyone who will listen for the past year that nixos-unstable is slower to receive security updates than stable

11:22 <gchristensen> https://gist.github.com/grahamc/c60578c6e6928043d29a427361634df6#which-channel-is-right-for-me

11:22 <LnL> sure, that will always be the case

11:22 <peti> That's the reality, no doubt.

11:22 <peti> But it doesn't have to be.

11:24 <peti> We can create a protected "nixos-unstable" branch (or whatever name) and let Hydry build & publish that. Then we have a second jobset that builds "master". When "mater" passes all Hydra tests, then we merge (automatically) to the release branch. Everything is already built, so it could go out, basically, immediately.

11:24 orivej has quit [Ping timeout: 256 seconds]

11:25 <peti> When an important security update comes up, however, then we commit it directly into the "nixos-unstable" branch and by-pass master.

11:25 <gchristensen> interesting!

11:25 <gchristensen> though, we then get the same problem with -- staging -- poor coordination. we may require more than just this, but maybe a branch-for-the-week or something

11:26 <peti> This gives us 3 kinds of releases: "nixos-x.y" is quick and stable, "nixos-unstable" is quick and unstable, and "master" is the wild west.

11:27 <gchristensen> before this, we should probably try and figure out a better coordination scheme for staging

11:27 <peti> gchristensen: Well, the acceptance criteria for "nixos-unstable" is well defined. You need "https://hydra.nixos.org/job/nixos/trunk-combined/tested#tabs-constituents" to succeed, then it can be merge. So there's no manual decision making involved.

11:28 <peti> gchristensen: The problem with staging is a different one. The problem there is that you have to merge to master just to run the tests! We don't run them for staging.

11:28 <gchristensen> let's fix it!

11:28 <ben> Is it prohibitive to block merges to master that don't pass tests?

11:29 <gchristensen> with master -> whatever -> release-nixos-unstable, the jobs will pile up on master just like they are already

11:29 <LnL> in practice that might run into merge conflicts, but it's probably fine most of the time

11:29 <gchristensen> changing faster than we can properly test everything

11:29 <vcunat> ATM it might probably prohibitive to run the tested job on every commit

11:31 <peti> The problem remains that "master" can move so fast that more tests break every day than we can fix.

11:32 <gchristensen> yes

11:32 <peti> I am looking at you, nixos.tests.gnome3.

11:32 <ben> If it's not automatically the responsibility of the person proposing to merge something to master to ensure that the change doesn't break master, it seems hard to ensure that anyone cares enough to fix it again

11:32 <vcunat> that's the main problem with staging as well

11:32 <gchristensen> yeah

11:33 <vcunat> (but there it's made worse by the fact that builds need more time)

11:33 <gchristensen> ben: it is very very hard to ask someone to perform possibly hundreds of hours of builds locally before pressing merge on a change that by all means should be perfectly fine

11:33 <ben> I'm gonna get my drive-by PR merged, it fixes my use case but breaks the rest o the world, then I'm gonna walk away and never look at the issue tracker again

11:33 <gchristensen> actually, probably thousands of hours of CPU time

11:34 <ben> I'm definitely used to smaller integration test cycles, yeah

11:34 <vcunat> When thinking of staging workflows, if we have *one* that works nice for both of them, it will be good, for simplicity.

11:34 <peti> openSUSE handles this issue as follows. They have many separate staging branches, and all of them are built and tested by their build server. Now, contributors submit patches and the "review team" distributes those patches into separate staging areas for testing. Only after the tests of some staging-X branch succeed, the changes get into "master".

11:35 <peti> Now, sometimes changes are submitted separately but they belong together logically. In that case, they get merged into the same staging branch so that they are tested (and merged) together.

11:35 <vcunat> peti: and that's run on openbuildservice?

11:36 <gchristensen> ben: ultimately we need to have tooling and process on the receiving side to handle it

11:36 <peti> On other occassions, staging branches are split in two, because a submit request contains multiple modifications where only some of them cause trouble but others don't -- so you test them separately to speed things up.

11:36 <ben> The rust project tags PRs as "reviewed, could be merged" and automation runs tests on the merge commit and pushes to master once it passes. The author does not have to expend their own CPU time but they'll have to keep working on the PR until the automation finds a merge commit against the ever-moving master that is acceptable. afaik there is a manual process to batch changes that are unlikely to cause an

11:36 <ben> issue into a single run of the integration tests, but PRs still remain unmerged until the merge commit is confirmed to work.

11:37 <peti> vcunat: Yes. But we can do the exact same thing with Hydra. It's more about a logical structure to the branches and the build service.

11:37 <gchristensen> peti: this sounds like a good avenue to explore

11:37 <vcunat> we would need some (half-)automatic way of creating Hydra jobsets, e.g. based on branch naming

11:37 <ben> The rust thing sounds vaguely isomorphic to manually maintained staging branches...

11:37 <vcunat> Though we could start by cloning some "prototype" jobset.

11:38 <gchristensen> peti: are all changes staged like this?

11:38 <LnL> vcunat: automating hydra jobsets isn't a problem

11:38 <LnL> it just doesn't scale for every pr

11:38 <peti> vcunat: We could have build jobs calles "staging-01", "staging-02", and so on that build coresponding branches. The rest is just a matter of appropriate cherry picking and merging in git.

11:38 <vcunat> peti: I assume suse use a similar process even for the enterprise versions?

11:38 <gchristensen> peti: and policies around *who* can do *what* and *when

11:39 <vcunat> branches named by people might also work

11:40 <vcunat> (the person responsible for that branch)

11:40 <ben> Without backpressure against merges to master based on the brokenness of master I don't see how you can hope to keep up with the rate of changes without burning out the people who feel obliged to actually return things to a working state

11:40 <gchristensen> how is this different from putting every PR in to a jobset?

11:40 <gchristensen> ben +1 totally agreed

11:40 <ben> (and subsequent contributors branching off a likely-broken master rather than a known-good state is probably also not ideal)

11:41 <peti> vcunat, gchristensen: In the commercial distribution, every change is verified *manually* by the QA team. This puts a tight upper limit on how many things can change in any given week. :-) There is a strict procedure, though, on what kind of changes can go in and which ones cannot. Generally speaking, it's a different setting because the code base is not supposed to change a lot. That's more comparable to

11:41 <peti> our release branches than to "master".

11:41 <gchristensen> ofborg has helped a lot with that, I think, and has exposed more issues :P

11:41 <ben> sorry for the drive-by opinions, I'll shush now :)

11:41 <gchristensen> ben: yeah. its good stuff, you're not wrong

11:42 <gchristensen> like, our current process did pretty well when we were merging ~700prs/mo

11:42 <gchristensen> now that we're consistently over 1k it is no longer serving us well

11:42 <peti> gchristensen: Yes, every single change is staged like that in openSUSE.

11:42 <vcunat> Right, with stable branches the situation is easier. I think our workflow for those is OK for now.

11:43 * peti thinks that merging semi-automatic updates only after some kind of testing would already go a long way to improve matters.

11:43 <gchristensen> I agree

11:44 <gchristensen> something Ryan may even be able to add to his existing testing

11:45 <vcunat> Well we might better move his testing ideas to us and apply them to all PRs (and perhaps all changes).

11:45 <gchristensen> of coursse

11:45 <vcunat> Ryan could be just a non-testing bot "filing PRs".

11:45 <gchristensen> but we can start with one place and then do both

11:46 <vcunat> +1

11:46 <gchristensen> a much cheaper experiment to say to Ryan, make sure X test passes before you open a PR... if the bleeding slows down, we move that to ofborg or policy or whatever

11:46 <peti> Also, our life would be *much easier* if Hydra would have more resources available. Right now, the feedback loop is too slow. When I commit a fix for issue A to master, then it takes easily 2-3 days until Hydry knows whether that fix actually worked. By then, chances are that issue B has popped up meanwhile. This would not happen if we'd had a cycle time of, say 4 hours.

11:47 <gchristensen> yes

11:48 * peti is sure that funding could be secured to improve the build farm.

11:49 <gchristensen> peti: maybe we should chat in private? :)

11:50 <vcunat> I added two boxes to Hydra this week :-)

11:50 <peti> I sent you messages on Wire anyway because of that other issue. Let's take it there.

11:50 <peti> gchristensen: ^^^

11:50 <gchristensen> oh cool

11:50 <gchristensen> I closed Wire. *goes to open it again*

11:51 <vcunat> But overall power on Hydra certainly could improve things significantly.

11:51 <ikwildrpepper> if you think we are in need of extra build machines, we can also add some hetzner machines (using foundation money)

11:51 <vcunat> And coverting some VM tests to containers instead.

11:51 * peti recently stumbled across https://cachix.org/. This is also a nice idea to reduce compile load. It's a shame that it does not quite address the issue of trust. That needs to be figured out on top.

11:51 <ikwildrpepper> (or increase the number of spot instances)

11:52 <vcunat> ikwildrpepper: either is relatively expensive over long term, isn't it?

11:52 <gchristensen> I'd rather find funding instead of spending foundation money

11:53 <ikwildrpepper> gchristensen: well, that's basically what the money of the foundation is meant for

11:53 <gchristensen> right

11:53 <ikwildrpepper> and we have at least a significant buffer atm

11:55 * peti thinks it would be great of we could run the "nixos-unstable" tests more quickly. This is our central means of quality assurance, but in the last couple of days feedback there has been slow'ish. I suppose converting (some) tests from VM to containers could go a long way to mitigate that issue, too.

11:56 <gchristensen> +1

11:56 jtojnar has joined #nixos-dev

12:01 <gchristensen> niksnut: what is the load like on the hydra master?

12:01 <niksnut> 14:01:27 up 2 days, 2:17, 2 users, load average: 4.49, 3.68, 3.60

12:02 <gchristensen> what does top report for `wa` / io wait?

12:03 <niksnut> 0.0

12:03 <gchristensen> are those surprise eyes or no wait?

12:03 <gchristensen> ^.^

12:03 <ikwildrpepper> don't niksnut does emojis

12:03 <ikwildrpepper> +think

12:03 <gchristensen> I know :D

12:04 <niksnut> I do twitch emotes though :p

12:05 <niksnut> but yeah hydra doesn't scale

12:06 <niksnut> maybe we can replace it by AwsStore

12:06 {^_^} has joined #nixos-dev

12:06 <niksnut> so you would do nix-build --store aws://... release.nix

12:06 <gchristensen> O.o

12:06 <niksnut> which would fire off some AWS Batch job to build the missing derivations

12:07 <gchristensen> my god

12:07 <niksnut> so Amazon would spin up / shut down the necessary build machines

12:08 <ikwildrpepper> niksnut: yeah, but you'd need some central process to prevent building derivations from building multiple times, right?

12:08 <ikwildrpepper> any idea how that could work nicely?

12:08 <niksnut> some locking via dynamodb probably

12:08 <ikwildrpepper> also, it'll be hard to make it work with requiredSystemFeatures

12:08 <gchristensen> what exactly about hydra doesn't scale?

12:09 <niksnut> also, optimistic concurrency, since we don't really care if two machines build the same derivation occasionally

12:09 <gchristensen> b/c I've ideas that don't involve committing to AWS like that

12:09 <niksnut> also, we'd create 1 aws batch job per derivation

12:09 <ikwildrpepper> niksnut: but in case of stdenv change, wouldn't that likely cause gcc etc to be built N times (where N is the maximum concurrency we use)

12:09 <niksnut> yeah, it would be limited to x86-linux

12:10 <niksnut> ikwildrpepper: no, each job would correspond to one derivation

12:10 <ikwildrpepper> one job per derivation would work

12:10 <ikwildrpepper> but you would only need to post job once dependencies have built

12:10 <ikwildrpepper> so you still need a separate process to coordinate ?

12:11 <niksnut> aws batch jobs can have dependencies

12:11 <niksnut> unfortunately, last time I looked at it, you could only have 20 dependencies per job

12:11 <ikwildrpepper> niksnut: not sure if aws can handle such big graphs :)

12:12 <niksnut> anyway, it doesn't have to be using aws batch

12:12 <niksnut> you could also have a process that pulls jobs from dynamodb or something

12:12 <gchristensen> can we go back to "<gchristensen> what exactly about hydra doesn't scale?" ? :)

12:12 <niksnut> that could also support non-aws machine

12:12 <niksnut> +s

12:13 <niksnut> gchristensen: it has a single central server

12:13 <gchristensen> agreed. now, let's not throw the baby out with the bathwater here. hydra does a lot of stuff that isn't just replaced by a new fancy backend store

12:13 ciil has quit [Quit: Lost terminal]

12:15 ciil has joined #nixos-dev

12:16 <gchristensen> the interface, while not especially humane as it is now, is really important. all the accoutrement that makes it a usable service -- restarting jobs, viewing logs, etc.

12:20 <gchristensen> I think a major falling over in efforts to replace hydra is thinking too grandly about initial plans, and not thinking more incrementally

12:24 <gchristensen> "Come to my talk, on 28th of June in London, nix-build -j 296: ofborg, and what even is Hydra?" https://www.meetup.com/NixOS-London/events/251792988/

12:24 * gchristensen says, to the author of Hydra

12:27 taktoa has quit [Remote host closed the connection]

13:06 <vcunat> For starters I expect it might help to upload from builders to S3 directly, instead of through the central machine. That's slightly harder on permissions/security, though.

13:08 <vcunat> (I'm partially guessing at the major bottleneck. Eelco will probably know more/better.)

13:09 <vcunat> I would hope that with that change we could scale to a multiple.

13:09 <gchristensen> that would require exposing the signing key to the builders, which is understandably a thing to consider given that would mean you and I would have to have it. however, practically speaking, by virtue of being able to build things and control the builder, it doesn't mean a whole lot.

13:10 <vcunat> The builder controls what gets signed.

13:10 <gchristensen> right

13:10 <vcunat> Each builder (or location) could have its own key, too.

13:11 <vcunat> (theoretically scalable to a PKI-like scheme)

13:11 <gchristensen> I have experimentally used rabbitmq as a bus to distribute one-message-per-drv to builders, which each built and then uploaded to the cache

13:13 <gchristensen> that experiment used a central process to control when jobs were sent to rabbitmq based on when its dependencies were satisfied, but this is not a fundamental requirement: builders could have sent a message to a "completed" queue, and all builders could watch that queue and keep track of which dependencies it is still waiting for

13:21 <vcunat> In general I would prefer to avoid significant vendor lock-in (like Amazon, if practicable).

13:22 <gchristensen> me too

13:24 <vcunat> And hosted/cloud builders probably won't be money-efficient over the long term. I got the 4-cores for roughly half a year's price of hetzner's "cloud" 4-cores...

13:25 <vcunat> (on the other hand a comparably performant CDN will be harder to build on our own)

13:26 <gchristensen> we'll want to be able to easily work in hardware provided by generous companies, too

13:33 <vcunat> Yes. I actually don't know if trusted compuation is feasible unless you "trust the physical location", but it's similar to trusting hetzner.

13:34 <gchristensen> well yes you can't trust the computer if you can't trust the location

13:34 <gchristensen> it is why DCs have policies and certifications asserting they have and follow policies

13:35 <gchristensen> I wouldn't trust WidgetCorp to provide hw, but I would trust DatacenterAndServerManagementCorp since it is their business

13:46 <vcunat> I can imagine a company physically donating an older server that they replace by a new one. My employer (a non-profit) got some HW this way, I think.

13:56 <ikwildrpepper> one problem with buying hardware is maintaining it, and it needs a datacenter

13:56 <gchristensen> and >1 person who can drive to it and replace the hard drive or whahtever

13:56 <ikwildrpepper> yeah

13:57 <ikwildrpepper> basically, hardware is painful

13:58 <gchristensen> I hardly like going down to my basement to fix my hw :P

13:58 <ikwildrpepper> hehe

14:10 <vcunat> yes, that is a problem

14:11 <vcunat> but for builders we don't really need >99% uptime, etc.

14:11 <vcunat> quantity/price seems to matter more than the usual datacenter tradeoff

14:12 <copumpkin> congrats niksnut :)

14:33 <Profpatsch> niksnut: Congratulations! Meanwhile, /me fails at the programming challenge.

14:33 ciil has quit [Quit: Lost terminal]

14:34 ciil has joined #nixos-dev

14:34 <gchristensen> Profpatsch: :( :(

14:35 <Profpatsch> gchristensen: Let’s see, maybe you can bribe Mathieu. :P

14:36 <Profpatsch> tbh I’m implementing Dijkstra with mutable vectors and am not sure whether that was intended.

14:38 <gchristensen> not sure we should talk about it publicly tbh

14:38 Sonarpulse has quit [Ping timeout: 245 seconds]

14:38 <Profpatsch> hehe, I didn’t want to go into it, yes.

14:39 <__Sander__> hehe

14:39 <Profpatsch> But there is a light at the end of the tunnel (and it’s not a train)

14:39 <__Sander__> insight information in the hiring process :D

14:39 <__Sander__> anyway

14:39 * __Sander__ has a new job since two months

14:41 <Profpatsch> Oh, nice, where do you works?

14:41 <Profpatsch> -s

14:42 <__Sander__> http://mendix.com

14:52 * ikwildrpepper also has a new job, since october :|

14:52 <ikwildrpepper> sorry, trolling again

14:52 <__Sander__> hehe

15:15 Synthetica has quit [Quit: Connection closed for inactivity]

15:26 <niksnut> peti: I got an approval request for the docker registry to access our github repo, what is that for?

15:32 <vcunat> I'd expect it to say what permissions they require.

15:32 <vcunat> I hope I'm not missing anything important, around those congratulations. (no idea what they're about)

15:32 <gchristensen> vcunat: https://twitter.com/n1ksnut/status/1009431406282530816

15:33 <ikwildrpepper> gchristensen: I thought the congratulations were because niksnut is finally active on twittag?

15:33 <ikwildrpepper> -g+h

15:34 <vcunat> Ah, thanks :-)

15:34 <vcunat> A single post probably doesn't yet count as "active".

15:35 <peti> niksnut: hub.docker.io can automatically re-build this nixos/nix docker image every time the repo changes on github. To do that, docker.io needs read access to the repo, though.

15:35 <gchristensen> peti: afai remember GitHub will not create a read-only access integration, but a minimum of a read-write access

15:37 <niksnut> hm, wouldn't it be better to update the image as part of the release script?

15:39 <vcunat> It seems better to rebuild the image on every commit, so problems are located sooner.

15:39 <peti> niksnut: That would work, too, no doubt. I don't think that it would be better though. At least I don't see any concrete advantages.

15:40 <vcunat> (that's from CI point of view - not for actual publicly used images)

15:41 <niksnut> peti: presumably we want the image to provide the latest released version, not master

15:43 <vcunat> I'd think that doing both is best.

15:43 <niksnut> I don't really see a reason for providing master

15:44 <peti> niksnut: The latest release version has a tag that people can use if they want to. The latest build from master is tagged "latest".

15:44 <gchristensen> we could _build_ the image in Hydra automatically

15:44 <gchristensen> as part of regular builds

15:45 <niksnut> we're not really eating our own dogfood if we're not building the image in Nix

15:45 <LnL> my image doesn't depend on the docker daemon except for the last part that initialises the db

15:46 <LnL> and that could use runAsRoot from dockerTools

15:49 Sonarpulse has joined #nixos-dev

15:49 <peti> niksnut: We are re-inventing the wheel for no good reason if we don't use that service. hub.docker.io exists precisely so that you have a central registry for images. What is the point of building the image in some otherway? The end result is going to be exactly the same.

15:50 <gchristensen> because we're a build system ^.^

15:50 <peti> The docker image that comes out of calling "docker build ." is no shinier just because Nix executed that command.

15:51 <gchristensen> Nix doesn't execute it

15:52 <gchristensen> Nix makes the tarballs `docker build .` would have made, and it is a major feature that a lot of users like a lot

15:52 orivej has joined #nixos-dev

15:52 __Sander__ has quit [Quit: Konversation terminated!]

15:53 <peti> gchristensen: "docker build ." doesn't make tarballs.

15:53 <gchristensen> it does, you just don't see it :) each layer is a tarball and inside is a metadata file, and a filesystem tarball

15:54 <LnL> we're talking about the docker image layers here, not the nix tarball

15:55 <LnL> those can be loaded into the docker daemon with docker load

15:57 * peti thinks that *not* re-using this existing service is just plain stupid. We can set up an automated build on hub.docker.io in 30 seconds and provide reliable, reusable docker images with Nix inside to everyone.

15:57 orivej has quit [Ping timeout: 265 seconds]

15:58 <simpson> peti: FWIW I think it's remarkably stupid that docker.io can't *read* a *public* repo without bothering people.

15:58 <peti> simpson: I suppose they don't want to busy poll.

15:58 <gchristensen> the issue is the commit hooks to be notified of builds

15:58 <gchristensen> which is dumb, because they could just provide the URL to post to

15:59 <peti> gchristensen: They do. You can also configure a manual web-hook POST trigger, if you want.

15:59 <gchristensen> nice!

15:59 <gchristensen> that seems much better

15:59 <vcunat> I certainly wouldn't be afraid of the github integration for docker-hub

16:09 matthewbauer has joined #nixos-dev

16:12 matthewbauer has quit [Write error: Connection reset by peer]

16:14 matthewbauer has joined #nixos-dev

16:25 matthewbauer has quit [Ping timeout: 240 seconds]

16:27 <peti> niksnut: What do you gain by stopping me from setting up this service? It's not like having A prevents you somehow from doing B anyway. If you want a Nix-built docker image too, then just have it! Where is the downside to using the automation now that already exists?

16:28 matthewbauer has joined #nixos-dev

16:30 <gchristensen> with the "Integration" style, does it require write access to repositories? I believe it does

16:30 <gchristensen> on that scenario, I'm -1

16:31 matthewbauer has quit [Read error: Connection reset by peer]

16:32 matthewbauer has joined #nixos-dev

16:33 <vcunat> Such issues might be worked around by creating a mirror repo, but that's probably cumbersome.

16:37 matthewbauer has quit [Ping timeout: 256 seconds]

16:38 <domenkozar> afaik it only needs read access

16:38 <domenkozar> which seems fine to me

16:39 matthewbauer has joined #nixos-dev

16:40 <gchristensen> it depensd on how you connected your github account to docker hub, and the nixos org isn't able to ensure you selected one or the other

16:41 <gchristensen> https://screenshotscdn.firefoxusercontent.com/images/4c1185c9-d08f-4eed-9d16-2f8293f093e4.png

16:41 <gchristensen> https://screenshotscdn.firefoxusercontent.com/images/f0ac4ab3-e04c-4204-a313-3e04b08d4f42.png vs https://screenshotscdn.firefoxusercontent.com/images/3f273f95-86c3-4654-bd12-17c8691177de.png

16:41 <gchristensen> and actually, I guess since it is asking Eelco for permission, it implies the more liberal permission grant

16:42 <gchristensen> indeed, yes, that is what is happening

16:47 matthewbauer has quit [Ping timeout: 240 seconds]

16:48 matthewbauer has joined #nixos-dev

17:13 <domenkozar> yeah apps need to opt-in for github apps now

17:13 matthewbauer has quit [Read error: Connection reset by peer]

17:13 <domenkozar> that takes 30% for the fact that apps can do fine grained permissions

17:14 matthewbauer has joined #nixos-dev

17:15 <domenkozar> niksnut: does hydra use abstract store for pushing new builds? I'm asking if the http binary cache in Nix 2.0 would work with Hydra

17:16 <niksnut> domenkozar: yes, should work

17:16 <domenkozar> :O

17:16 <domenkozar> that means hydra could support cachix soon

17:17 <domenkozar> m'kay

17:18 <domenkozar> niksnut: thanks!

17:23 <vcunat> I guess you don't mean Hydra.nixos.org

17:24 <domenkozar> hydra as software

17:24 <vcunat> (cache.nixos.org seems enough for that)

17:24 <vcunat> +1

17:25 matthewbauer has quit [Ping timeout: 268 seconds]

17:48 MichaelRaskin has joined #nixos-dev

17:57 matthewbauer has joined #nixos-dev

18:06 <LnL> domenkozar: I've been wondering it would be useful for ofborg, enabling sharing builds across machines

18:07 <domenkozar> if we got cache.nixos.org and ofborg on cachix, they'd reuse nar uploads :-)

18:09 matthewbauer has quit [Read error: Connection reset by peer]

18:10 matthewbauer has joined #nixos-dev

18:19 <vcunat> good for ofborg

18:20 <vcunat> at least in future when there are really multiple slaves per platform

18:21 <domenkozar> yeah :)

18:22 <LnL> vcunat: hmm?

18:34 <vcunat> LnL: right now it reports 32 slaves for aarch64 (single machine AFAIK), 2 slaves for darwin and one for x86

18:35 <LnL> aarch is the only platform with a single physical builder

18:35 <LnL> even the evaluator runs on multiple machines now IIRC

18:35 <vcunat> this isn't really relevant to evaluators

18:36 <vcunat> (caching of builds)

18:36 <LnL> sure, my point is that the other platforms are distributed

18:37 <vcunat> not much ATM apparently

18:37 <vcunat> if I read https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1 right

18:37 <LnL> it's usually 3 linux 2 darwin

18:39 <LnL> but yes, a cache only becomes important when we start building more stuff

18:40 matthewbauer has quit [Read error: Connection reset by peer]

18:41 <MichaelRaskin> 3 Linux meaning 2 physical boxes, though. Even 4 Linux might be just 3+1

18:41 <LnL> oh?

18:42 <vcunat> one aarch64 machine is counted as 32 ATM

18:42 matthewbauer has joined #nixos-dev

18:42 <vcunat> (as an example)

18:42 <MichaelRaskin> Yes. When my builder is up (not right now), I usually run 2 or 3 builders.

18:42 <LnL> why? it doesn't start builds with -j1

18:43 <vcunat> the point is to run multiple separate PRs

18:43 <vcunat> (i.e. separate nix invocations)

18:43 <MichaelRaskin> Yes, so that a single slow one doesn't block a ton of small ones

18:43 <MichaelRaskin> Also, tell me more about configure -j8

18:44 <LnL> sure for the aarch machine it makes sense, but unless you have a crazy linux box it won't really do much

18:45 <MichaelRaskin> Parallel configure does make sense with 16GiB of RAM and most requests being small

18:45 <LnL> I'm talking about --option max-jobs not cores here

18:46 <MichaelRaskin> ofborg doesn't do parallel requests

18:47 <MichaelRaskin> A lot of requests build just one path, or a linear sequence of paths

18:50 <LnL> not so sure about that, but maybe

18:52 <LnL> anyway, I kind of expected there to be a bit more builders by now

18:54 <vcunat> they don't seem to be overloaded, at least ATM

18:54 <vcunat> (so now I was adding slaves to hydra.nixos.org instead)

18:56 <MichaelRaskin> I guess we have an equilibrium: even a single amd64 box from vcunat would keep up with the builds at ~80% duty cycle (as it does on Thursdays), and 3 builders on two machines eat everything quickly; but doing expensive tests like LibreOffice all the time is not a good idea anyway.

18:56 matthewbauer has quit [Read error: Connection reset by peer]

18:56 <MichaelRaskin> So there is no immediate payoff from adding more builders, and it doesn't seem a good idea to add more builds, and there we go

18:56 <LnL> why not? if there was more capacity

18:57 matthewbauer has joined #nixos-dev

18:57 <MichaelRaskin> You need to change both at once. Coordination. Coordination in a Nix* project.

18:57 <MichaelRaskin> Doable, might be nice, but requires a planned and tracked effort.

18:59 <LnL> if we get more people to contribute their idle desktop we can get more capacity first, then make some changes to start using it

18:59 <LnL> instead of the current stalemate

19:01 <MichaelRaskin> Well, using idle desktop resources without impacting other uses of the same computer is more complicated

19:06 <MichaelRaskin> I guess we can assume that ofborg jobs have approximately the same trust level in the sense of non-maliciousness as Nixpkgs master commits, otherwise there is this question of fixed-output risks

19:06 <vcunat> ofborg assumes almost no trust

19:07 <vcunat> It's strictly separated from what goes to cache.nixos.org.

19:07 <vcunat> (in the relevant direction)

19:07 <domenkozar> it assumes git trust

19:07 <domenkozar> afaik github doesn't provide history for force-pushes

19:08 <vcunat> you can see a short history of force-pushes

19:08 <LnL> that's not entirely true, only maintainers and a trusted set of users can trigger it

19:08 <domenkozar> well yes, it trusts maintainers :)

19:08 matthewbauer has quit [Read error: Connection reset by peer]

19:09 <vcunat> I meant: we don't need to trust the builders.

19:09 <MichaelRaskin> Well, extra-known-users too

19:09 <MichaelRaskin> That is true

19:09 <LnL> ah yeah

19:09 <vcunat> The builders still need to trust us (and git) a bit.

19:09 matthewbauer has joined #nixos-dev

19:10 <LnL> as for force pushes, I think that's handled

19:15 matthewbauer has quit [Read error: Connection reset by peer]

19:16 matthewbauer has joined #nixos-dev

19:19 matthewbauer has quit [Read error: Connection reset by peer]

19:19 matthewbauer has joined #nixos-dev

19:24 matthewbauer has quit [Read error: Connection reset by peer]

19:26 matthewbauer has joined #nixos-dev

19:29 matthewbauer has quit [Read error: Connection reset by peer]

19:37 matthewbauer has joined #nixos-dev

19:38 obadz- has joined #nixos-dev

19:40 obadz has quit [Ping timeout: 256 seconds]

19:40 obadz- is now known as obadz

19:48 vcunat has quit [Quit: Leaving.]

19:59 matthewbauer has quit [Read error: Connection reset by peer]

19:59 matthewbauer has joined #nixos-dev

20:08 matthewbauer has quit [Read error: Connection reset by peer]

20:13 matthewbauer has joined #nixos-dev

20:37 matthewbauer has quit [Ping timeout: 256 seconds]

20:38 matthewbauer has joined #nixos-dev

20:46 matthewbauer has quit [Ping timeout: 256 seconds]

20:56 orivej has joined #nixos-dev

21:22 matthewbauer has joined #nixos-dev

21:24 matthewbauer has quit [Remote host closed the connection]

21:25 matthewbauer has joined #nixos-dev

21:30 matthewbauer has quit [Ping timeout: 260 seconds]

21:32 matthewbauer has joined #nixos-dev

21:33 orivej has quit [Ping timeout: 245 seconds]

21:37 <gchristensen> ofborg builds the approved commit, not just the current version of the PR

21:38 <gchristensen> MichaelRaskin: I think your view of the project's equilibrium is a bit too pessimistic

21:38 matthewbauer has quit [Ping timeout: 240 seconds]

21:39 <gchristensen> I haven't been working on expanding its build capacity and use because of my medical issues, not apathy

21:41 <gchristensen> afaik we could probably start pushing builds from nixos-org-maintained builders to the nixos cache, but I am not excited to make a cache from other builders

21:41 <MichaelRaskin> gchristensen: I didn't mean that the equilibrium is about apathy by any specific person

21:41 <gchristensen> then you will be glad to know there is indeed a planned and tracked effort

21:42 <MichaelRaskin> I meant that it is an equilibrium w.r.t. random people (not) actively asking you for permission to run a builder

21:42 <gchristensen> right, got it

21:43 <MichaelRaskin> I didn't really mean that planned effort definitely doesn't exist, it is just a level of effort that participates in prioritisation and can lose the competition for priority for some time.

21:43 <gchristensen> that is of course true

21:43 <Sonarpulse> gchristensen: know if your talk will be recorded>

21:43 <Sonarpulse> *?

21:43 <Sonarpulse> i asked before but then lost irc

21:43 <Sonarpulse> sorry

21:43 <gchristensen> no idea :) zimbatm?

21:49 matthewbauer has joined #nixos-dev

21:59 ciil has quit [Ping timeout: 256 seconds]

22:06 ciil has joined #nixos-dev

22:19 matthewbauer has quit [Ping timeout: 264 seconds]

22:37 <zimbatm> Sonarpulse: not this time, our regular venue wasn't available

22:37 <Sonarpulse> zimbatm: ah man!

22:38 <Sonarpulse> oh well

22:38 <Sonarpulse> fingers cross for a bootleg :D

22:38 ciil has quit [Quit: Lost terminal]

22:38 <zimbatm> No choice, you've got to come to London :)

22:39 <gchristensen> +1

22:40 ciil has joined #nixos-dev

22:47 orivej has joined #nixos-dev

23:13 <jtojnar> Sonarpulse: 👍 on the meson patch, Jussi will be convinced

23:13 <Sonarpulse> jtojnar: as in you are optimistic?

23:14 <Sonarpulse> jtojnar: i was wondering whether you had any opinions on how to approach this / do the convincing

23:15 <jtojnar> Sonarpulse: not sure, meson development seems to be very vision-oriented, but nirbheek also supports it

23:16 <Sonarpulse> yeah I don't know the community at all

23:16 <Sonarpulse> but I was looking for some cmake-like thing to spread the cross Gospel, haha, and I had heard very good things about meson

23:17 <jtojnar> Sonarpulse: personally, I am not very familiar with the cross-compilation requirements so I probably will not be able to contribute to the advokacy

23:18 <Sonarpulse> jtojnar: well the issue at that specific thing is less cross, then trying to do as much "eval" time as possible

23:18 <Sonarpulse> more purity

23:18 <Sonarpulse> the follow up stuff for cross is just collapsing duplicate code paths internally

23:18 <Sonarpulse> and hopefully will be less controversial

23:22 <jtojnar> generally, it seems to me like Jussi knows what he is doing, though meson is still pretty young and some rough edges are showing around the less standard workflows (my beef is mostly with splitting packages)

23:30 <Sonarpulse> jtojnar: yeah in this case I feel like forcing faith in autodetection on the end user is a bit much

23:30 <Sonarpulse> also in nixpkgs we have everything in localSystem and crossSystem

23:30 <Sonarpulse> so we just want to force that

23:30 <Sonarpulse> if reality doesn't match the spec, reality is wrong, not the spec

23:31 <Sonarpulse> it's sort of hard to convey this without just being like "meson's great but I don't want to trust it"