gchristensen changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | 18.09 release managers: vcunat and samueldr | https://logs.nix.samueldr.com/nixos-dev
<thoughtpolice> Open question: how much memory is 'too much' for a NixOS test? My exact number here is ~3GB at the moment and I don't plan on going much beyond that, but it'd be nice to know
<worldofpeace> thoughtpolice: The highest i'm seeing is `4096mb` and that's the gitlab test I think
<samueldr> oh, since there are brilliant minds here not on #nixos-aarch64, I have this PR #51207 which reduces the output size of the sd-image, which would allow hydra builds to succeed, allowing up-to-date images to be distributed
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51207 (by samueldr, 1 day ago, open): sd-image: Slims the ext4 filesystem even more.
<worldofpeace> You know, just a minor change, nothing extreamly helpful :P
<samueldr> the way I see it: a huge change in the generation of the installer image; which could have unintended bad consequences!
<samueldr> my initial naïve approach was to leave zero blocks free
<samueldr> which was causing some issues at boot :)
<hedning> Could someone abort the jobs in https://hydra.nixos.org/jobset/nixpkgs/gnome we've merged to staging so it's probably just a waste to finish these builds in case more mass rebuilds is coming to staging, thanks :)
<gchristensen> sure, hedning
jtojnar has joined #nixos-dev
<gchristensen> ok all canceled hedning
<hedning> cheers :)
<thoughtpolice> gchristensen: Are ofborg 'test' commands still only run under x86_64-linux, no aarch64-linux?
<gchristensen> I think they run on aarch64
<gchristensen> let's find out
<thoughtpolice> gchristensen: Also, the new Checks screen/tab is super fancy, too!
<gchristensen> =)
<gchristensen> seems I need to be less cute with the name of the check
pie___ has joined #nixos-dev
pie__ has quit [Remote host closed the connection]
<thoughtpolice> gchristensen: Looks like it is using aarch64-linux. Fingers crossed.
srk has quit [Ping timeout: 246 seconds]
coconnor has joined #nixos-dev
srk has joined #nixos-dev
jtojnar has quit [Quit: jtojnar]
srk has quit [Ping timeout: 250 seconds]
srk has joined #nixos-dev
pie__ has joined #nixos-dev
pie___ has quit [Ping timeout: 268 seconds]
<gchristensen> hrm, nix1 doesn't compile in nixpkgs master anymore
<thoughtpolice> gchristensen: Hmmm. Apparently https://github.com/NixOS/nixpkgs/pull/51306 is a bit tricky! It seems it fails on rbvermaa-spot (build 92b49fd1-b310-4573-92b4-7c55d43f5cdf), but is ok with builder-0-gustav.ewr1.nix.ci (build c8e75306-41da-461d-a9cf-2f6afcb7ab58)
<{^_^}> #51306 (by thoughtpolice, 2 hours ago, open): nixos/cockroachdb: create new service
<thoughtpolice> gchristensen: Also, ofborg's "test" command seemed to do something funky, so I went for nixosTests instead right now :) I think 'test' no longer invokes the build on nixos/release.nix?
<gchristensen> a PR to nixpkgs broke the test command, and hasn't been fixed :(
<thoughtpolice> :(
<gchristensen> there is a fix PR in to nixpkgs, but it is maybe not the right fix, and the fix might should be for ofborg... not super clear
<thoughtpolice> gchristensen: Anyway, the pr for cockroachdb up there uses a trick I haven't seen before. It uses the 'ptp_kvm' module in a nixos-test-vm guest to punch through the system time of the host into the VM as a hardware clock. You can then configure an NTP server to use this hardware clock as its time source, to provide NTP time inside NixOS tests
<gchristensen> whoa
<thoughtpolice> Does ofborg have anything special about the configs it deploys? rbvermaa-spot seems to fail to load the ptp_kvm device which seems like it might be some kind of host interaction with QEMU
<gchristensen> https://travis-ci.org/NixOS/ofborg/jobs/462074638#L735 :( cargo test is segfaulting
<gchristensen> rbveerma is a ec2 instance
<thoughtpolice> (I saw cargo segfault for me too, recently. Couldn't figure that one out)
<gchristensen> builder-0-gustav is bare metal hw from Packet
<thoughtpolice> I suppose I might have to dig into the driver to see what 'Failed to insert 'ptp_kvm': No such device' means. "No such device"
<gchristensen> maybe ec2 doesn't allow ptp kvm
<thoughtpolice> Possibly, I guess it depends on how qemu sources the time.
<gchristensen> ok plenty of fights for the morning. good night
Synthetica has quit [Quit: Connection closed for inactivity]
orivej has joined #nixos-dev
worldofpeace has quit [Ping timeout: 246 seconds]
<ekleog> gchristensen globin: does my comment at https://github.com/NixOS/nixpkgs/pull/50342#discussion_r236104957 make sense? I'd like to try and get this in a ready-for-production state :)
sir_guy_carleton has quit [Quit: WeeChat 2.2]
worldofpeace has joined #nixos-dev
phreedom has joined #nixos-dev
worldofpeace has quit [Remote host closed the connection]
FRidh has joined #nixos-dev
Cale has quit [Ping timeout: 250 seconds]
Cale has joined #nixos-dev
orivej has quit [Ping timeout: 268 seconds]
jtojnar has joined #nixos-dev
<tilpner> samueldr - Did you compare your custom logic against resize2fs -M?
<tilpner> Oh, I was only looking at the diff, nevermind
Taneb has joined #nixos-dev
orivej has joined #nixos-dev
orivej has quit [Ping timeout: 250 seconds]
<LnL> urgh, /. + builtins.unsafeDiscardStringContext (builtins.toFile "foo" "bar")
<LnL> I want a builtins.notFakeToPath
<infinisil> LnL: I think there was a PR for a proper toPath
jtojnar has left #nixos-dev [#nixos-dev]
jtojnar has joined #nixos-dev
<LnL> no, it's deprecated AFAIK
<infinisil> I thought I saw such a PR in nixpkgs, but I guess not, can't find anything
<LnL> error: a string that refers to a store path cannot be appended to a path
<LnL> but... I'm not appending anything
<LnL> hrm and now I'm running into this again https://github.com/NixOS/nix/issues/1728
<{^_^}> nix#1728 (by LnL7, 50 weeks ago, closed): store path literals are copied to the store again
* LnL will just use strings
<infinisil> Good choice
jtojnar has quit [Quit: jtojnar]
jtojnar has joined #nixos-dev
<ekleog> anyone has an idea why hydra darwin appears to still not have scheduled a build triggered ~72hrs ago? that's getting pretty long, and I can't find any numbers for the number of jobs pending for darwin, to check whether hydra is not just stuck there
<gchristensen> https://hydra.nixos.org/queue-summary shows darwin jobs
<ekleog> nice, thanks :) will come back in a few hours to check if it has gone down then
<gchristensen> that graph shouldn't look so linear :P
<ekleog> hmmm ok :D I was wondering why mac9 appeared to be like always failing
<gchristensen> mac9 is dead
<ekleog> oh
<gchristensen> hrm, hydra has been building many jobs on macs
<ekleog> so it looks like the number of queued jobs has gone down by ~36 in the last ~5min
<ekleog> (after a refresh of queue-summary)
Jackneill has quit [Ping timeout: 245 seconds]
<ekleog> so I guess it's just way far back in the build backlog
Jackneill has joined #nixos-dev
<ekleog> mac5 appears to be failing regularly too on the prometheus, others are at least 20k sec since last failure, so abot 5hrs
<ekleog> so apart from mac5 / mac9, others appear relatively healthy
<gchristensen> good
<gchristensen> :)
<ekleog> oh nice for your last link :)
<ekleog> only sad thing: looks like we can filter by aarch64/x86_64 only, not by linux/darwin :°
<gchristensen> eh?
<ekleog> actually it looks like even the aarch64/x86_64 selector top left (“machine”) doesn't change anything in the display, so nvm
<gchristensen> ah, I meant to delete that
<ekleog> ah ^^
<ekleog> so mostly x86_64-darwin and a bit aarch64-linux are missing build power… well, I guess that was to be expected
<ekleog> (btw, if I can hijack that discussion, dunno whether you saw my hl, but if you can comment on https://github.com/NixOS/nixpkgs/pull/50342#discussion_r236104957 when you find a bit of time it'd be great to know how to move that forward 😇)
<samueldr> tilpner: yeah, and the issue with `-M` is known by the developers, documented in the manual, and annoying :) though I can see how much better it is to *not* actually leave zero blocks free :)
<samueldr> though there's still a bunch of space saving which can be done
<tilpner> samueldr - I read your PR description afterwards
<tilpner> Didn't see you had explained it already
<samueldr> :) just re-stating facts for irc-only dwellers
init_6 has quit []
<samueldr> AFAIUI, aborted jobs are never restarted unless asked explicitely in hydra, right?
<gchristensen> ye
<samueldr> we're having up to ~6000 aborted jobs in release-18.09-aarch64 that are annoyingly subbornly kept aborted
<gchristensen> can you restart jobs?
<samueldr> I have no special rights to anything
<gchristensen> do you have an accont?
<samueldr> yes
<samueldr> useful to make a dashboard!
<gchristensen> ok you can restart jobs
<samueldr> thanks
<samueldr> let's hope this fills up the -stable aarch64 release with good builds :)
<ekleog> what do you mean by “make a dashboard”?
<gchristensen> users can make dashboards https://hydra.nixos.org/dashboard/graham@grahamc.com
<ekleog> oh nice :)
<ekleog> samueldr: yours is private, though :°
* samueldr is still a hydra noob
<samueldr> it shouldn't starting now
<ekleog> can confirm
<ekleog> much red
<samueldr> a good chunk is timed out and aborts
<samueldr> and a good chunk is aarch64 :)
* ekleog should write that cron job that mails him when a watched build fails
<gchristensen> :/
<samueldr> if you are using something like stylus, to add custom CSS to pages, https://gist.github.com/samueldr/71845e0c8a1996ca7994629b17e6a9bb
<samueldr> shows time-outs differently
<gchristensen> neat
<gchristensen> want to PR that?
<samueldr> this in itself is a hack; a better solution could be PR'd
<gchristensen> ah
<samueldr> I was thinking about other statuses too
orivej has joined #nixos-dev
<samueldr> hydra's configurePhase (when ran in the nix-shell as per appendix A) opened what seemingly was `less`
<gchristensen> hah
sir_guy_carleton has joined #nixos-dev
orivej has quit [Ping timeout: 246 seconds]
sir_guy_carleton has quit [Quit: WeeChat 2.2]
<thoughtpolice> Okay, here's a sort of open question I was thinking about while implementing #51306 -- what are the semantics of the systemd target "time-sync.target"?
<{^_^}> https://github.com/NixOS/nixpkgs/pull/51306 (by thoughtpolice, 16 hours ago, open): nixos/cockroachdb: create new service
<thoughtpolice> The systemd.special(7) man page says: "Services responsible for synchronizing the system clock from a remote source (such as NTP client implementations) should pull in this target and order themselves before it. All services where correct time is essential should be ordered after this unit, but not pull it in..."
<thoughtpolice> But there's a catch: I don't think any of our NTP daemons block startup to do an immediate synchronization. Most of them will wait for a while before doing the first adjustment.
<thoughtpolice> But waiting for the first adjustment is important for many services, because if the measured delta is too big, you often want to skip, not slew, the system time. This often happens after boot.
<thoughtpolice> If the measured difference is like 200ms then a slew is fine: you can slowly adjust the system clock and step it forward little by little, or stretch it, to get it to align. But if you have like a 1.5s delta on boot, you don't want to slew for the next several hours. You're much better off just doing an immediate skip. But an immediate skip is also dangerous, unless every service that depends on synchronized time happens *after*
<thoughtpolice> the skip
<thoughtpolice> CockroachDB is an example of this. Especially with the test I wrote, I'd say 60% of the time, the clock is over 1s off on boot on the NixOS test virtual machines. This violates some internal consistency constraints and causes the cluster to fail booting/joining nodes.
<thoughtpolice> And if the clock skips immediately during runtime, things also will explode. So you must absolutely, always wait for the initial measurement/synchronization.
<thoughtpolice> Currently I do this by blocking the ExecStartPre on chrony, in my tests. You can force Chrony to do rapid initial measurements but I don't think doing that by default is correct or very "nice", so instead I just use a command to wait for the first full time synchronization.
<thoughtpolice> But IMO the systemd man page seems to indicate that maybe time-sync.target should work this way: it should not wait for chrony to *start*, but for it to actually do the first adjustment, after some measurements, which may take some time.
<thoughtpolice> Does anyone have opinions/thoughts on this -- whether it should block until adjustment? Or has anyone ever had to do something like this themselves?
<gchristensen> I think your interpretation is correct
<thoughtpolice> It also makes actually writing services that block on synchronized time non-modular -- the CockroachDB module has no idea what NTP daemon you chose, so now in the test I have to specifically inject knowledge of chrony into ExecStartPre
<thoughtpolice> by merging in my own systemd.services.cockroachdb.preStart, urgh.
<thoughtpolice> Ceph is apparently the only other target that uses time-sync.service. I imagine it's very important for Ceph, too.
<jtojnar> anyone want to take a look at maintainers/scripts/update.nix parallelization? https://github.com/NixOS/nixpkgs/pull/50977
<{^_^}> #50977 (by jtojnar, 1 week ago, open): update.nix: Run update scripts in parallel
Haskellfant has joined #nixos-dev
cocreature has quit [Ping timeout: 250 seconds]
Haskellfant is now known as cocreature
JosW has joined #nixos-dev
<thoughtpolice> Is there a way to do global assertions outside of any particular module? Like, I have 3 modules that are mutually exclusive, having them mkForce others off is weird and requires a lot of duplication.
<thoughtpolice> I'd rather just have one place that simply ensures only one of them is 'true'
JosW has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]
orivej has joined #nixos-dev
ixxie has joined #nixos-dev
ixxie has quit [Quit: leaving]
<thoughtpolice> https://github.com/NixOS/nixpkgs/pull/51338 hot off the grill and untested
<{^_^}> #51338 (by thoughtpolice, 1 minute ago, open): WIP: nixos: make time-sync.target block until initial adjustment with all NTP daemons
orivej has quit [Ping timeout: 246 seconds]
JosW has joined #nixos-dev
FRidh has quit [Quit: Konversation terminated!]
orivej has joined #nixos-dev
JosW has quit [Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/]
<samueldr> I'll re-ask here: any pro-tips for hacking on hydra?
<samueldr> mainly looking for ways to reload the controllers without restarting hydra-server
<samueldr> the hydra-server script documents itself as being able to do it; but I think *something* in the code replaces the server with starman
<domenkozar> I'll just say: may the force be with you
<gchristensen> you can do this
<samueldr> is there any sentence in english that cannot be misinterpreted? I thought you were going to follow with "this" (something to do), not that it was words of encouragement :)
<domenkozar> heh
<gchristensen> oh, what a let down! :)
<LnL> :D
<samueldr> got it: https://metacpan.org/pod/Catalyst::ScriptRunner -> this will run `src/lib/Hydra/Script/Server.pm` when it exists, otherwise it will default to its internal (Catalyst) runner
<domenkozar> samueldr: what do you want to fix in hydra btw? :)
<samueldr> everything?
<samueldr> :)
<gchristensen> samueldr++
<{^_^}> samueldr's karma got increased to 40
<domenkozar> sounds like me in 2016 :)
<samueldr> nah, both wanted to try to learn hydra a bit more, and also fix niggling issues
<samueldr> like the logo not being centered :)
<samueldr> right now it's more about getting my hands on hydra to better understand how it works
<samueldr> oh, one thing I'd like to add is an endpoint like "last-finished-eval", but "latest-finished-eval" which instead of redirecting to the last finished, redirects to the eval which is newest, which is finished
<samueldr> to finish fixing the channels going back in time
<domenkozar> samueldr++
<{^_^}> samueldr's karma got increased to 41
<gchristensen> my stickers better get here quick, before you get to 50.
<timokau[m]> samueldr++ taking one for the team
<{^_^}> samueldr's karma got increased to 42
<samueldr> I should also take notes
<samueldr> to better on-board users hacking on hydra
<timokau[m]> My biggest issue is still that maintainers are not notified on failure, I'm not sure if that is still rooted in some hydra issue anymore or just a case of "nobody wants to flip the switch"
<makefu> samueldr: that is a fantastic thing to do. the manual for hydra is not what it could be, especially for people new to hydra. it talks about enabling hydra channels, using nix-env to install and exporting variables
<samueldr> yeah :/
<makefu> i started https://nixos.wiki/wiki/Hydra , and it seems that someone added some more infos to the page
<samueldr> though the first draft probably is more about getting the thing up and running more than explaining what's a hydra
<samueldr> e.g. I'm looking at using the tests/jobs jobs as a good first step :)
<samueldr> and running it in a development context
<samueldr> so no services, nothing system-level
<makefu> this is why i included this subsection https://nixos.wiki/wiki/Hydra#Definitions
worldofpeace has joined #nixos-dev