#nixos-dev on 2018-12-01

2018-08-16 20:49 gchristensen changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | 18.09 release managers: vcunat and samueldr | https://logs.nix.samueldr.com/nixos-dev

00:15 <thoughtpolice> Open question: how much memory is 'too much' for a NixOS test? My exact number here is ~3GB at the moment and I don't plan on going much beyond that, but it'd be nice to know

00:21 <worldofpeace> thoughtpolice: The highest i'm seeing is `4096mb` and that's the gitlab test I think

00:24 <samueldr> oh, since there are brilliant minds here not on #nixos-aarch64, I have this PR #51207 which reduces the output size of the sd-image, which would allow hydra builds to succeed, allowing up-to-date images to be distributed

00:24 <{^_^}> https://github.com/NixOS/nixpkgs/pull/51207 (by samueldr, 1 day ago, open): sd-image: Slims the ext4 filesystem even more.

00:26 <worldofpeace> You know, just a minor change, nothing extreamly helpful :P

00:26 <samueldr> the way I see it: a huge change in the generation of the installer image; which could have unintended bad consequences!

00:27 <samueldr> my initial naïve approach was to leave zero blocks free

00:27 <samueldr> which was causing some issues at boot :)

00:57 <hedning> Could someone abort the jobs in https://hydra.nixos.org/jobset/nixpkgs/gnome we've merged to staging so it's probably just a waste to finish these builds in case more mass rebuilds is coming to staging, thanks :)

00:58 <gchristensen> sure, hedning

01:00 jtojnar has joined #nixos-dev

01:02 <gchristensen> ok all canceled hedning

01:02 <hedning> cheers :)

01:15 <gchristensen> https://discourse.nixos.org/t/ofborg-now-uses-checks/1558

01:26 <thoughtpolice> gchristensen: Are ofborg 'test' commands still only run under x86_64-linux, no aarch64-linux?

01:27 <gchristensen> I think they run on aarch64

01:27 <gchristensen> let's find out

01:27 <thoughtpolice> gchristensen: Also, the new Checks screen/tab is super fancy, too!

01:28 <gchristensen> =)

01:30 <gchristensen> seems I need to be less cute with the name of the check

01:55 pie___ has joined #nixos-dev

01:56 pie__ has quit [Remote host closed the connection]

02:06 <thoughtpolice> gchristensen: Looks like it is using aarch64-linux. Fingers crossed.

02:13 srk has quit [Ping timeout: 246 seconds]

02:22 coconnor has joined #nixos-dev

02:29 srk has joined #nixos-dev

03:27 jtojnar has quit [Quit: jtojnar]

03:29 srk has quit [Ping timeout: 250 seconds]

03:32 srk has joined #nixos-dev

03:40 pie__ has joined #nixos-dev

03:44 pie___ has quit [Ping timeout: 268 seconds]

04:03 <gchristensen> hrm, nix1 doesn't compile in nixpkgs master anymore

04:14 <thoughtpolice> gchristensen: Hmmm. Apparently https://github.com/NixOS/nixpkgs/pull/51306 is a bit tricky! It seems it fails on rbvermaa-spot (build 92b49fd1-b310-4573-92b4-7c55d43f5cdf), but is ok with builder-0-gustav.ewr1.nix.ci (build c8e75306-41da-461d-a9cf-2f6afcb7ab58)

04:14 <{^_^}> #51306 (by thoughtpolice, 2 hours ago, open): nixos/cockroachdb: create new service

04:15 <thoughtpolice> gchristensen: Also, ofborg's "test" command seemed to do something funky, so I went for nixosTests instead right now :) I think 'test' no longer invokes the build on nixos/release.nix?

04:15 <gchristensen> a PR to nixpkgs broke the test command, and hasn't been fixed :(

04:15 <thoughtpolice> :(

04:16 <gchristensen> there is a fix PR in to nixpkgs, but it is maybe not the right fix, and the fix might should be for ofborg... not super clear

04:17 <thoughtpolice> gchristensen: Anyway, the pr for cockroachdb up there uses a trick I haven't seen before. It uses the 'ptp_kvm' module in a nixos-test-vm guest to punch through the system time of the host into the VM as a hardware clock. You can then configure an NTP server to use this hardware clock as its time source, to provide NTP time inside NixOS tests

04:17 <gchristensen> whoa

04:17 <thoughtpolice> Does ofborg have anything special about the configs it deploys? rbvermaa-spot seems to fail to load the ptp_kvm device which seems like it might be some kind of host interaction with QEMU

04:17 <gchristensen> https://travis-ci.org/NixOS/ofborg/jobs/462074638#L735 :( cargo test is segfaulting

04:18 <gchristensen> rbveerma is a ec2 instance

04:18 <thoughtpolice> (I saw cargo segfault for me too, recently. Couldn't figure that one out)

04:19 <gchristensen> builder-0-gustav is bare metal hw from Packet

04:20 <thoughtpolice> I suppose I might have to dig into the driver to see what 'Failed to insert 'ptp_kvm': No such device' means. "No such device"

04:20 <gchristensen> maybe ec2 doesn't allow ptp kvm

04:20 <thoughtpolice> Possibly, I guess it depends on how qemu sources the time.

04:22 <gchristensen> https://twitter.com/grhmc/status/1068719241996644352

04:28 <gchristensen> ok plenty of fights for the morning. good night

05:18 Synthetica has quit [Quit: Connection closed for inactivity]

05:34 orivej has joined #nixos-dev

05:46 worldofpeace has quit [Ping timeout: 246 seconds]

05:56 <ekleog> gchristensen globin: does my comment at https://github.com/NixOS/nixpkgs/pull/50342#discussion_r236104957 make sense? I'd like to try and get this in a ready-for-production state :)

06:04 sir_guy_carleton has quit [Quit: WeeChat 2.2]

06:40 worldofpeace has joined #nixos-dev

07:09 phreedom has joined #nixos-dev

07:14 worldofpeace has quit [Remote host closed the connection]

07:51 FRidh has joined #nixos-dev

08:02 Cale has quit [Ping timeout: 250 seconds]

08:15 Cale has joined #nixos-dev

09:11 orivej has quit [Ping timeout: 268 seconds]

10:09 jtojnar has joined #nixos-dev

10:16 <tilpner> samueldr - Did you compare your custom logic against resize2fs -M?

10:28 <tilpner> Oh, I was only looking at the diff, nevermind

10:55 Taneb has joined #nixos-dev

10:58 orivej has joined #nixos-dev

11:21 orivej has quit [Ping timeout: 250 seconds]

11:49 <LnL> urgh, /. + builtins.unsafeDiscardStringContext (builtins.toFile "foo" "bar")

11:50 <LnL> I want a builtins.notFakeToPath

11:51 <infinisil> LnL: I think there was a PR for a proper toPath

11:52 jtojnar has left #nixos-dev [#nixos-dev]

11:54 jtojnar has joined #nixos-dev

11:55 <LnL> no, it's deprecated AFAIK

11:56 <infinisil> I thought I saw such a PR in nixpkgs, but I guess not, can't find anything

11:57 <LnL> error: a string that refers to a store path cannot be appended to a path

11:57 <LnL> but... I'm not appending anything

12:04 <LnL> hrm and now I'm running into this again https://github.com/NixOS/nix/issues/1728

12:04 <{^_^}> nix#1728 (by LnL7, 50 weeks ago, closed): store path literals are copied to the store again

12:05 * LnL will just use strings

12:05 <infinisil> Good choice

14:32 jtojnar has quit [Quit: jtojnar]

14:35 jtojnar has joined #nixos-dev

14:40 <ekleog> anyone has an idea why hydra darwin appears to still not have scheduled a build triggered ~72hrs ago? that's getting pretty long, and I can't find any numbers for the number of jobs pending for darwin, to check whether hydra is not just stuck there

14:40 <gchristensen> https://hydra.nixos.org/queue-summary shows darwin jobs

14:41 <ekleog> nice, thanks :) will come back in a few hours to check if it has gone down then

14:41 <gchristensen> https://status.nixos.org/prometheus/graph?g0.range_input=2d&g0.expr=hydra_machine_last_failure%7Bhost%3D~%22.*mac.*%22%7D&g0.tab=0

14:43 <gchristensen> that graph shouldn't look so linear :P

14:44 <ekleog> hmmm ok :D I was wondering why mac9 appeared to be like always failing

14:44 <gchristensen> mac9 is dead

14:44 <ekleog> oh

14:44 <gchristensen> hrm, hydra has been building many jobs on macs

14:45 <ekleog> so it looks like the number of queued jobs has gone down by ~36 in the last ~5min

14:45 <ekleog> (after a refresh of queue-summary)

14:45 Jackneill has quit [Ping timeout: 245 seconds]

14:45 <ekleog> so I guess it's just way far back in the build backlog

14:45 Jackneill has joined #nixos-dev

14:46 <gchristensen> https://status.nixos.org/grafana/d/MJw9PcAiz/hydra-jobs?refresh=30s&orgId=1&from=now-24h&to=now indeed -- Jobsper Machine Type

14:47 <ekleog> mac5 appears to be failing regularly too on the prometheus, others are at least 20k sec since last failure, so abot 5hrs

14:47 <ekleog> so apart from mac5 / mac9, others appear relatively healthy

14:48 <gchristensen> good

14:48 <gchristensen> :)

14:48 <ekleog> oh nice for your last link :)

14:49 <ekleog> only sad thing: looks like we can filter by aarch64/x86_64 only, not by linux/darwin :°

14:50 <gchristensen> eh?

14:51 <ekleog> actually it looks like even the aarch64/x86_64 selector top left (“machine”) doesn't change anything in the display, so nvm

14:51 <gchristensen> ah, I meant to delete that

14:52 <ekleog> ah ^^

14:52 <ekleog> so mostly x86_64-darwin and a bit aarch64-linux are missing build power… well, I guess that was to be expected

14:53 <ekleog> (btw, if I can hijack that discussion, dunno whether you saw my hl, but if you can comment on https://github.com/NixOS/nixpkgs/pull/50342#discussion_r236104957 when you find a bit of time it'd be great to know how to move that forward 😇)

15:06 <samueldr> tilpner: yeah, and the issue with `-M` is known by the developers, documented in the manual, and annoying :) though I can see how much better it is to *not* actually leave zero blocks free :)

15:06 <samueldr> though there's still a bunch of space saving which can be done

15:06 <tilpner> samueldr - I read your PR description afterwards

15:07 <tilpner> Didn't see you had explained it already

15:07 <samueldr> :) just re-stating facts for irc-only dwellers

15:14 init_6 has quit []

15:16 <samueldr> AFAIUI, aborted jobs are never restarted unless asked explicitely in hydra, right?

15:18 <gchristensen> ye

15:18 <samueldr> we're having up to ~6000 aborted jobs in release-18.09-aarch64 that are annoyingly subbornly kept aborted

15:18 <gchristensen> can you restart jobs?

15:19 <samueldr> I have no special rights to anything

15:19 <gchristensen> do you have an accont?

15:19 <samueldr> yes

15:19 <samueldr> useful to make a dashboard!

15:19 <gchristensen> ok you can restart jobs

15:20 <samueldr> thanks

15:21 <samueldr> let's hope this fills up the -stable aarch64 release with good builds :)

15:21 <ekleog> what do you mean by “make a dashboard”?

15:21 <gchristensen> users can make dashboards https://hydra.nixos.org/dashboard/graham@grahamc.com

15:21 <samueldr> https://hydra.nixos.org/dashboard/samuel@dionne-riel.com

15:21 <ekleog> oh nice :)

15:21 <samueldr> there's a star left of the title https://hydra.nixos.org/job/nixos/release-18.03-aarch64/nixos.sd_image.aarch64-linux

15:22 <ekleog> samueldr: yours is private, though :°

15:22 * samueldr is still a hydra noob

15:22 <samueldr> it shouldn't starting now

15:22 <ekleog> can confirm

15:22 <ekleog> much red

15:23 <samueldr> a good chunk is timed out and aborts

15:23 <samueldr> and a good chunk is aarch64 :)

15:25 * ekleog should write that cron job that mails him when a watched build fails

15:25 <gchristensen> :/

15:28 <samueldr> if you are using something like stylus, to add custom CSS to pages, https://gist.github.com/samueldr/71845e0c8a1996ca7994629b17e6a9bb

15:28 <samueldr> https://stuff.samueldr.com/screenshots/2018/12/20181201102821.png

15:28 <samueldr> shows time-outs differently

15:29 <gchristensen> neat

15:29 <gchristensen> want to PR that?

15:30 <samueldr> this in itself is a hack; a better solution could be PR'd

15:30 <gchristensen> ah

15:30 <samueldr> I was thinking about other statuses too

15:45 orivej has joined #nixos-dev

15:45 <samueldr> hydra's configurePhase (when ran in the nix-shell as per appendix A) opened what seemingly was `less`

15:45 <gchristensen> hah

15:48 <samueldr> not only seemingly: https://gist.github.com/samueldr/c2e305d600efb1dd3c874975bb7b3414

15:59 sir_guy_carleton has joined #nixos-dev

17:33 orivej has quit [Ping timeout: 246 seconds]

18:06 sir_guy_carleton has quit [Quit: WeeChat 2.2]

18:12 <thoughtpolice> Okay, here's a sort of open question I was thinking about while implementing #51306 -- what are the semantics of the systemd target "time-sync.target"?

18:12 <{^_^}> https://github.com/NixOS/nixpkgs/pull/51306 (by thoughtpolice, 16 hours ago, open): nixos/cockroachdb: create new service

18:13 <thoughtpolice> The systemd.special(7) man page says: "Services responsible for synchronizing the system clock from a remote source (such as NTP client implementations) should pull in this target and order themselves before it. All services where correct time is essential should be ordered after this unit, but not pull it in..."

18:13 <thoughtpolice> But there's a catch: I don't think any of our NTP daemons block startup to do an immediate synchronization. Most of them will wait for a while before doing the first adjustment.

18:14 <thoughtpolice> But waiting for the first adjustment is important for many services, because if the measured delta is too big, you often want to skip, not slew, the system time. This often happens after boot.

18:15 <thoughtpolice> If the measured difference is like 200ms then a slew is fine: you can slowly adjust the system clock and step it forward little by little, or stretch it, to get it to align. But if you have like a 1.5s delta on boot, you don't want to slew for the next several hours. You're much better off just doing an immediate skip. But an immediate skip is also dangerous, unless every service that depends on synchronized time happens *after*

18:15 <thoughtpolice> the skip

18:17 <thoughtpolice> CockroachDB is an example of this. Especially with the test I wrote, I'd say 60% of the time, the clock is over 1s off on boot on the NixOS test virtual machines. This violates some internal consistency constraints and causes the cluster to fail booting/joining nodes.

18:17 <thoughtpolice> And if the clock skips immediately during runtime, things also will explode. So you must absolutely, always wait for the initial measurement/synchronization.

18:18 <thoughtpolice> Currently I do this by blocking the ExecStartPre on chrony, in my tests. You can force Chrony to do rapid initial measurements but I don't think doing that by default is correct or very "nice", so instead I just use a command to wait for the first full time synchronization.

18:19 <thoughtpolice> But IMO the systemd man page seems to indicate that maybe time-sync.target should work this way: it should not wait for chrony to *start*, but for it to actually do the first adjustment, after some measurements, which may take some time.

18:21 <thoughtpolice> Does anyone have opinions/thoughts on this -- whether it should block until adjustment? Or has anyone ever had to do something like this themselves?

18:24 <gchristensen> I think your interpretation is correct

18:28 <thoughtpolice> It also makes actually writing services that block on synchronized time non-modular -- the CockroachDB module has no idea what NTP daemon you chose, so now in the test I have to specifically inject knowledge of chrony into ExecStartPre

18:29 <thoughtpolice> by merging in my own systemd.services.cockroachdb.preStart, urgh.

18:29 <thoughtpolice> Ceph is apparently the only other target that uses time-sync.service. I imagine it's very important for Ceph, too.

18:30 <jtojnar> anyone want to take a look at maintainers/scripts/update.nix parallelization? https://github.com/NixOS/nixpkgs/pull/50977

18:30 <{^_^}> #50977 (by jtojnar, 1 week ago, open): update.nix: Run update scripts in parallel

18:33 Haskellfant has joined #nixos-dev

18:33 cocreature has quit [Ping timeout: 250 seconds]

18:33 Haskellfant is now known as cocreature

18:40 JosW has joined #nixos-dev

18:52 <thoughtpolice> Is there a way to do global assertions outside of any particular module? Like, I have 3 modules that are mutually exclusive, having them mkForce others off is weird and requires a lot of duplication.

18:52 <thoughtpolice> I'd rather just have one place that simply ensures only one of them is 'true'

19:01 JosW has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]

19:19 orivej has joined #nixos-dev

19:21 ixxie has joined #nixos-dev

19:37 ixxie has quit [Quit: leaving]

19:49 <thoughtpolice> https://github.com/NixOS/nixpkgs/pull/51338 hot off the grill and untested

19:49 <{^_^}> #51338 (by thoughtpolice, 1 minute ago, open): WIP: nixos: make time-sync.target block until initial adjustment with all NTP daemons

20:12 orivej has quit [Ping timeout: 246 seconds]

20:34 JosW has joined #nixos-dev

21:14 FRidh has quit [Quit: Konversation terminated!]

21:32 orivej has joined #nixos-dev

21:36 JosW has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]

22:03 <samueldr> I'll re-ask here: any pro-tips for hacking on hydra?

22:03 <samueldr> mainly looking for ways to reload the controllers without restarting hydra-server

22:04 <samueldr> the hydra-server script documents itself as being able to do it; but I think *something* in the code replaces the server with starman

22:09 <domenkozar> I'll just say: may the force be with you

22:12 <gchristensen> you can do this

22:14 <samueldr> is there any sentence in english that cannot be misinterpreted? I thought you were going to follow with "this" (something to do), not that it was words of encouragement :)

22:15 <domenkozar> heh

22:20 <gchristensen> oh, what a let down! :)

22:22 <LnL> :D

22:23 <samueldr> got it: https://metacpan.org/pod/Catalyst::ScriptRunner -> this will run `src/lib/Hydra/Script/Server.pm` when it exists, otherwise it will default to its internal (Catalyst) runner

22:29 <domenkozar> samueldr: what do you want to fix in hydra btw? :)

22:29 <samueldr> everything?

22:29 <samueldr> :)

22:29 <gchristensen> samueldr++

22:29 <{^_^}> samueldr's karma got increased to 40

22:29 <domenkozar> sounds like me in 2016 :)

22:29 <samueldr> nah, both wanted to try to learn hydra a bit more, and also fix niggling issues

22:30 <samueldr> like the logo not being centered :)

22:31 <samueldr> right now it's more about getting my hands on hydra to better understand how it works

22:33 <samueldr> oh, one thing I'd like to add is an endpoint like "last-finished-eval", but "latest-finished-eval" which instead of redirecting to the last finished, redirects to the eval which is newest, which is finished

22:33 <samueldr> to finish fixing the channels going back in time

22:34 <domenkozar> samueldr++

22:34 <{^_^}> samueldr's karma got increased to 41

22:35 <gchristensen> my stickers better get here quick, before you get to 50.

22:35 <timokau[m]> samueldr++ taking one for the team

22:35 <{^_^}> samueldr's karma got increased to 42

22:35 <samueldr> I should also take notes

22:36 <samueldr> to better on-board users hacking on hydra

22:38 <timokau[m]> My biggest issue is still that maintainers are not notified on failure, I'm not sure if that is still rooted in some hydra issue anymore or just a case of "nobody wants to flip the switch"

22:39 <makefu> samueldr: that is a fantastic thing to do. the manual for hydra is not what it could be, especially for people new to hydra. it talks about enabling hydra channels, using nix-env to install and exporting variables

22:39 <samueldr> yeah :/

22:40 <makefu> i started https://nixos.wiki/wiki/Hydra , and it seems that someone added some more infos to the page

22:40 <samueldr> though the first draft probably is more about getting the thing up and running more than explaining what's a hydra

22:40 <samueldr> e.g. I'm looking at using the tests/jobs jobs as a good first step :)

22:41 <samueldr> and running it in a development context

22:41 <samueldr> so no services, nothing system-level

22:42 <makefu> this is why i included this subsection https://nixos.wiki/wiki/Hydra#Definitions

23:40 worldofpeace has joined #nixos-dev