<thoughtpolice>
Open question: how much memory is 'too much' for a NixOS test? My exact number here is ~3GB at the moment and I don't plan on going much beyond that, but it'd be nice to know
<worldofpeace>
thoughtpolice: The highest i'm seeing is `4096mb` and that's the gitlab test I think
<samueldr>
oh, since there are brilliant minds here not on #nixos-aarch64, I have this PR #51207 which reduces the output size of the sd-image, which would allow hydra builds to succeed, allowing up-to-date images to be distributed
<worldofpeace>
You know, just a minor change, nothing extreamly helpful :P
<samueldr>
the way I see it: a huge change in the generation of the installer image; which could have unintended bad consequences!
<samueldr>
my initial naïve approach was to leave zero blocks free
<samueldr>
which was causing some issues at boot :)
<hedning>
Could someone abort the jobs in https://hydra.nixos.org/jobset/nixpkgs/gnome we've merged to staging so it's probably just a waste to finish these builds in case more mass rebuilds is coming to staging, thanks :)
<thoughtpolice>
gchristensen: Are ofborg 'test' commands still only run under x86_64-linux, no aarch64-linux?
<gchristensen>
I think they run on aarch64
<gchristensen>
let's find out
<thoughtpolice>
gchristensen: Also, the new Checks screen/tab is super fancy, too!
<gchristensen>
=)
<gchristensen>
seems I need to be less cute with the name of the check
pie___ has joined #nixos-dev
pie__ has quit [Remote host closed the connection]
<thoughtpolice>
gchristensen: Looks like it is using aarch64-linux. Fingers crossed.
srk has quit [Ping timeout: 246 seconds]
coconnor has joined #nixos-dev
srk has joined #nixos-dev
jtojnar has quit [Quit: jtojnar]
srk has quit [Ping timeout: 250 seconds]
srk has joined #nixos-dev
pie__ has joined #nixos-dev
pie___ has quit [Ping timeout: 268 seconds]
<gchristensen>
hrm, nix1 doesn't compile in nixpkgs master anymore
<thoughtpolice>
gchristensen: Hmmm. Apparently https://github.com/NixOS/nixpkgs/pull/51306 is a bit tricky! It seems it fails on rbvermaa-spot (build 92b49fd1-b310-4573-92b4-7c55d43f5cdf), but is ok with builder-0-gustav.ewr1.nix.ci (build c8e75306-41da-461d-a9cf-2f6afcb7ab58)
<{^_^}>
#51306 (by thoughtpolice, 2 hours ago, open): nixos/cockroachdb: create new service
<thoughtpolice>
gchristensen: Also, ofborg's "test" command seemed to do something funky, so I went for nixosTests instead right now :) I think 'test' no longer invokes the build on nixos/release.nix?
<gchristensen>
a PR to nixpkgs broke the test command, and hasn't been fixed :(
<thoughtpolice>
:(
<gchristensen>
there is a fix PR in to nixpkgs, but it is maybe not the right fix, and the fix might should be for ofborg... not super clear
<thoughtpolice>
gchristensen: Anyway, the pr for cockroachdb up there uses a trick I haven't seen before. It uses the 'ptp_kvm' module in a nixos-test-vm guest to punch through the system time of the host into the VM as a hardware clock. You can then configure an NTP server to use this hardware clock as its time source, to provide NTP time inside NixOS tests
<gchristensen>
whoa
<thoughtpolice>
Does ofborg have anything special about the configs it deploys? rbvermaa-spot seems to fail to load the ptp_kvm device which seems like it might be some kind of host interaction with QEMU
<{^_^}>
nix#1728 (by LnL7, 50 weeks ago, closed): store path literals are copied to the store again
* LnL
will just use strings
<infinisil>
Good choice
jtojnar has quit [Quit: jtojnar]
jtojnar has joined #nixos-dev
<ekleog>
anyone has an idea why hydra darwin appears to still not have scheduled a build triggered ~72hrs ago? that's getting pretty long, and I can't find any numbers for the number of jobs pending for darwin, to check whether hydra is not just stuck there
<samueldr>
tilpner: yeah, and the issue with `-M` is known by the developers, documented in the manual, and annoying :) though I can see how much better it is to *not* actually leave zero blocks free :)
<samueldr>
though there's still a bunch of space saving which can be done
<tilpner>
samueldr - I read your PR description afterwards
<tilpner>
Didn't see you had explained it already
<samueldr>
:) just re-stating facts for irc-only dwellers
init_6 has quit []
<samueldr>
AFAIUI, aborted jobs are never restarted unless asked explicitely in hydra, right?
<gchristensen>
ye
<samueldr>
we're having up to ~6000 aborted jobs in release-18.09-aarch64 that are annoyingly subbornly kept aborted
<gchristensen>
can you restart jobs?
<samueldr>
I have no special rights to anything
<gchristensen>
do you have an accont?
<samueldr>
yes
<samueldr>
useful to make a dashboard!
<gchristensen>
ok you can restart jobs
<samueldr>
thanks
<samueldr>
let's hope this fills up the -stable aarch64 release with good builds :)
<thoughtpolice>
Okay, here's a sort of open question I was thinking about while implementing #51306 -- what are the semantics of the systemd target "time-sync.target"?
<thoughtpolice>
The systemd.special(7) man page says: "Services responsible for synchronizing the system clock from a remote source (such as NTP client implementations) should pull in this target and order themselves before it. All services where correct time is essential should be ordered after this unit, but not pull it in..."
<thoughtpolice>
But there's a catch: I don't think any of our NTP daemons block startup to do an immediate synchronization. Most of them will wait for a while before doing the first adjustment.
<thoughtpolice>
But waiting for the first adjustment is important for many services, because if the measured delta is too big, you often want to skip, not slew, the system time. This often happens after boot.
<thoughtpolice>
If the measured difference is like 200ms then a slew is fine: you can slowly adjust the system clock and step it forward little by little, or stretch it, to get it to align. But if you have like a 1.5s delta on boot, you don't want to slew for the next several hours. You're much better off just doing an immediate skip. But an immediate skip is also dangerous, unless every service that depends on synchronized time happens *after*
<thoughtpolice>
the skip
<thoughtpolice>
CockroachDB is an example of this. Especially with the test I wrote, I'd say 60% of the time, the clock is over 1s off on boot on the NixOS test virtual machines. This violates some internal consistency constraints and causes the cluster to fail booting/joining nodes.
<thoughtpolice>
And if the clock skips immediately during runtime, things also will explode. So you must absolutely, always wait for the initial measurement/synchronization.
<thoughtpolice>
Currently I do this by blocking the ExecStartPre on chrony, in my tests. You can force Chrony to do rapid initial measurements but I don't think doing that by default is correct or very "nice", so instead I just use a command to wait for the first full time synchronization.
<thoughtpolice>
But IMO the systemd man page seems to indicate that maybe time-sync.target should work this way: it should not wait for chrony to *start*, but for it to actually do the first adjustment, after some measurements, which may take some time.
<thoughtpolice>
Does anyone have opinions/thoughts on this -- whether it should block until adjustment? Or has anyone ever had to do something like this themselves?
<gchristensen>
I think your interpretation is correct
<thoughtpolice>
It also makes actually writing services that block on synchronized time non-modular -- the CockroachDB module has no idea what NTP daemon you chose, so now in the test I have to specifically inject knowledge of chrony into ExecStartPre
<thoughtpolice>
by merging in my own systemd.services.cockroachdb.preStart, urgh.
<thoughtpolice>
Ceph is apparently the only other target that uses time-sync.service. I imagine it's very important for Ceph, too.
<{^_^}>
#50977 (by jtojnar, 1 week ago, open): update.nix: Run update scripts in parallel
Haskellfant has joined #nixos-dev
cocreature has quit [Ping timeout: 250 seconds]
Haskellfant is now known as cocreature
JosW has joined #nixos-dev
<thoughtpolice>
Is there a way to do global assertions outside of any particular module? Like, I have 3 modules that are mutually exclusive, having them mkForce others off is weird and requires a lot of duplication.
<thoughtpolice>
I'd rather just have one place that simply ensures only one of them is 'true'
<samueldr>
I'll re-ask here: any pro-tips for hacking on hydra?
<samueldr>
mainly looking for ways to reload the controllers without restarting hydra-server
<samueldr>
the hydra-server script documents itself as being able to do it; but I think *something* in the code replaces the server with starman
<domenkozar>
I'll just say: may the force be with you
<gchristensen>
you can do this
<samueldr>
is there any sentence in english that cannot be misinterpreted? I thought you were going to follow with "this" (something to do), not that it was words of encouragement :)
<domenkozar>
heh
<gchristensen>
oh, what a let down! :)
<LnL>
:D
<samueldr>
got it: https://metacpan.org/pod/Catalyst::ScriptRunner -> this will run `src/lib/Hydra/Script/Server.pm` when it exists, otherwise it will default to its internal (Catalyst) runner
<domenkozar>
samueldr: what do you want to fix in hydra btw? :)
<samueldr>
everything?
<samueldr>
:)
<gchristensen>
samueldr++
<{^_^}>
samueldr's karma got increased to 40
<domenkozar>
sounds like me in 2016 :)
<samueldr>
nah, both wanted to try to learn hydra a bit more, and also fix niggling issues
<samueldr>
like the logo not being centered :)
<samueldr>
right now it's more about getting my hands on hydra to better understand how it works
<samueldr>
oh, one thing I'd like to add is an endpoint like "last-finished-eval", but "latest-finished-eval" which instead of redirecting to the last finished, redirects to the eval which is newest, which is finished
<samueldr>
to finish fixing the channels going back in time
<domenkozar>
samueldr++
<{^_^}>
samueldr's karma got increased to 41
<gchristensen>
my stickers better get here quick, before you get to 50.
<timokau[m]>
samueldr++ taking one for the team
<{^_^}>
samueldr's karma got increased to 42
<samueldr>
I should also take notes
<samueldr>
to better on-board users hacking on hydra
<timokau[m]>
My biggest issue is still that maintainers are not notified on failure, I'm not sure if that is still rooted in some hydra issue anymore or just a case of "nobody wants to flip the switch"
<makefu>
samueldr: that is a fantastic thing to do. the manual for hydra is not what it could be, especially for people new to hydra. it talks about enabling hydra channels, using nix-env to install and exporting variables