worldofpeace_ changed the topic of #nixos-dev to: #nixos-dev NixOS Development (#nixos for questions) | NixOS 20.03 BETA Announced https://discourse.nixos.org/t/nixos-20-03-beta/5935 | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | https://r13y.com | 19.09 RMs: disasm, sphalerite; 20.03: worldofpeace, disasm | https://logs.nix.samueldr.com/nixos-dev
mmlb has quit [Quit: The Lounge - https://thelounge.github.io]
mmlb has joined #nixos-dev
<{^_^}> resolved: RootPartitionLowDiskSpace: https://status.nixos.org/prometheus/alerts
<pie_[bnc]> Ericson2314: the makeExtensibles you put in llvm are kind of weird, libraries tools and the actual base stuff extend separately but get merged into one set at the end? does that mean they dont have interdependencies?
<pie_[bnc]> Ericson2314: that doesnt sound right
<pie_[bnc]> Ericson2314: how do you .extend these sets and keep dependencies in sync?
<Ericson2314> pie_[bnc]: the merging into one set is something I'd hapilly get rid of
<Ericson2314> the dependencies between tools and libraries always yse buidlPackages or targetPackages or smilar---they cross a stage boundary
<Ericson2314> the dependencies within each set are always in the same stage
<pie_[bnc]> yeesh
<pie_[bnc]> Ericson2314: so uhh
<pie_[bnc]> Ericson2314: is there some easy way i can go from a llvmPackages to an llvmPackages with an llvm override and get things propagated properly to everything?
<Ericson2314> pie_[bnc]: there are arugments to the llvm default.nix that are the previous and next stage's attribute sets, I believe
<Ericson2314> look at the callPackage for it
<Ericson2314> so you can override but then you must also provide those arguments
<Ericson2314> to tie the knot
<pie_[bnc]> (oh nooooooo why is my refactor building llvm again...takes me literally a day D:)
<pie_[bnc]> Ericson2314: oh yeah i guess looking at the callpackage makes sense, this stuff is just hella intimidating
<Ericson2314> pie_[bnc] yeah sorry, I wish I had the bsck-compat-breaking PRs to merge automatically 6 months later or something
<pie_[bnc]> (oh whew it was only libcxx)
b42 has quit [Ping timeout: 240 seconds]
b42 has joined #nixos-dev
<gchristensen> so how carefully would y'all trust / not trust nixos containers for running CI tasks for an oss repo
<gchristensen> s/carefully/much/
* gchristensen assumes that is "a lot"
<cole-h> Probably. What's the difference from what CI tasks do now?
ris has quit [Ping timeout: 256 seconds]
<gchristensen> usually you don't run arbitrary commands from anonymous contributors on any ol' machine
<gchristensen> (and I'm not sure people really do trust nixos containers for isolation)
<clever> gchristensen: TURN is a webrtc thing used to deal with a lack of port-forwarding on both parties, and its basically a proxy server
<clever> gchristensen: in this case, it had zero filtering on the destination, so you could hit up the aws metadata (including iam instance profiles) via slack's TURN servers....
<clever> the same could be done with nix builds
<clever> but obviously, not with the same level of interactivity
teto has quit [Ping timeout: 265 seconds]
Guest30 has joined #nixos-dev
<nh2> gchristensen: I'd never trust Linux containers for isolation of untrusted code. Breaking out is too easy. I'd use KVM, it is quite fast
<Mic92> disasm: I have a loog
<Mic92> *look
<Mic92> disasm: this package is not pythonix. This is something else.
orivej has joined #nixos-dev
drakonis has joined #nixos-dev
<Profpatsch> qyliss: kk, good to know.
<Profpatsch> What’s the difference to Signed-off-by?
capisce_ has joined #nixos-dev
capisce has quit [Ping timeout: 256 seconds]
drakonis has quit [Quit: WeeChat 2.8]
<Emantor> Profpatsch: Signed-off-by is for https://developercertificate.org/, signed commits are used to cryptographically verify that the committer owns the corresponding GPG-key.
<Emantor> There is even a short wikipedia article on DCO: https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin
cole-h has quit [Quit: Goodbye]
drakonis_ has quit [Read error: Connection reset by peer]
drakonis_ has joined #nixos-dev
<Profpatsch> Ah, right, it was about authorship
harrow has quit [Quit: Leaving]
<Profpatsch> For Emacs contribution you have to send snail-mail to the FSF first :)
harrow has joined #nixos-dev
Guest30 has quit [Quit: Connection closed]
abbe has joined #nixos-dev
__monty__ has joined #nixos-dev
<arianvp> hmm
<arianvp> I have an overlay that I imported like
<arianvp> overlays (map (n: import n) [ ./my-overlay.nix ])
<arianvp> if I beta-reduce to overlays = [ import ./my-overlay.nix ] I get "infinite recursion encountered at unknown location"
<arianvp> but how is this possible? these two expressions are equivalent in a pure language?
<arianvp> I have no idea how to debug either :/
<arianvp> ../
<arianvp> I got it
<arianvp> had to be (import ./my-overlay.nix)
<arianvp> that is the most _obscure_ error ever. do we collect obscure error messages anywhere?
<arianvp> domenkozar[m]: :P
<domenkozar[m]> oh!
<arianvp> i would expect it to be a syntax error instead :P
<domenkozar[m]> can you make a minimal example?
<MichaelRaskin> domenkozar[m]: overlays = [import ./my-overlay.nix] with a null overlay in my-overlay.nix?
<MichaelRaskin> It is the generic forgetting-parenthesis inside a list
<arianvp> doesnt reproduce when I do that MichaelRaskin :/
<arianvp> seems to parse it as [ (import ./my-overlay.nix) ] correctly the import binds hard to the name
<arianvp> but for some reason the brackets are affecting evaluation
<MichaelRaskin> The brackets are list syntax
<arianvp> s/brackets/parentheses
<MichaelRaskin> Well, they affect whether you have a list of «import» and filename, or a single element that is the result of a call
<arianvp> ahh import is actually a primop, not a syntax thing. I see
<domenkozar[m]> so what's the minimal example to reproduce? :)
ashkitten has quit [Quit: WeeChat 2.8]
<arianvp> its not very minimal, but making it smaller makes the problem go away :P
<arianvp> you can reproduce by typing `nix repl .`
ashkitten has joined #nixos-dev
<domenkozar[m]> ok I got it :)
<tilpner> arianvp: map import foo is equivalent to map (n: import n) foo
<{^_^}> nix#3501 (by domenkozar, 15 seconds ago, open): infinite recursion encountered, at undefined position
<sphalerite> samueldr: Ericson2314: When would you be available for a shepherd meeting for RFC32?
<domenkozar[m]> that's the minimal example
<MichaelRaskin> Hmm. Is the (confusing part of the) problem that counter hits the overflow at primop evaluation that is, indeed, undefined position?
<domenkozar[m]> it could be that realiseContext has no Pos
<domenkozar[m]> ah no, it's infinite recursion obviously :)
<domenkozar[m]> so forceValue accepts optional Pos
<domenkozar[m]> which is most often not passed
* domenkozar[m] puts pos at all calls of forceValue and compiles
capisce_ is now known as capisce
justanotheruser has quit [Ping timeout: 246 seconds]
<domenkozar[m]> $ /nix/store/w2di83825csmkjcl6h8hr71vcrcw1vhr-nix-2.4pre0_0000000/bin/nix-instantiate --strict foo.nix
<domenkozar[m]> error: infinite recursion encountered, at /nix/store/6hhlgwkrlfv97k14fvfckbi84n8s0vav-nixos-20.03beta1320.9f0f06ac8be/nixos/lib/fixed-points.nix:69:67
<domenkozar[m]> hehe
<infinisil> > :p map (mapAttrs null) [ 0 ]
<{^_^}> error: value is an integer while a set was expected, at undefined position
<infinisil> Another example of "at undefined position
<infinisil> > :p map (x: mapAttrs null x) [ 0 ]
<{^_^}> value is an integer while a set was expected, at (string):296:9
<infinisil> Doesn't happen if you use a lambda
<arianvp> tilpner: but
<Ericson2314> sphalerite: lot's of times, these days :). 13:00, preferably 14:00 onwards UTC
<tilpner> arianvp: It's fine if you choose one over the other for clarity
<arianvp> ahhh so import isn't a keyword but a proper function
<arianvp> (builtin that is)
<arianvp> i didnt realise this
<Mic92> sphalerite: spacekookie niksnut worldofpeace Remember we have a meeting today at 2:00 UTC, spacekookie is leading the meeting.
<{^_^}> nix#3502 (by domenkozar, 31 seconds ago, open): pass Pos to forceValue to improve infinite recursion error
<domenkozar[m]> not that it helps knowing fixed point is looping :D
<gchristensen> domenkozar[m]++
<{^_^}> domenkozar[m]'s karma got increased to 14
<MichaelRaskin> Maybe a fragment of stacktrace until the first repeat (given some bound on the length) could help?
<domenkozar[m]> I think --show-trace should be the default
<MichaelRaskin> Without a bound on reply length?
<domenkozar[m]> with existing bounds :)
<makefu> domenkozar[m]: oh no! your PR removes our well-beloved nix-specific meme "error: infinite recursion encountered, at undefined position", now i have to throw away all the stickers i prepared for the next congress
<gchristensen> merge it merge it merge it
<Profpatsch> domenkozar[m]++
<{^_^}> domenkozar[m]'s karma got increased to 15
<Profpatsch> domenkozar[m]++
<{^_^}> domenkozar[m]'s karma got increased to 16
<Profpatsch> makefu: I want one anyway
<MichaelRaskin> As a sign of oldtimer who gets the reference?
FRidh has joined #nixos-dev
<arianvp> domenkozar[m]++
<{^_^}> domenkozar[m]'s karma got increased to 17
<julm> domenkozar[m]++
<{^_^}> domenkozar[m]'s karma got increased to 18
<arianvp> lmao makefu
<julm> Mic92: stig has not had a release since 2018, but a few hundreds commits since then. should we still wait for a release or can we package the current HEAD? is there Nixpkgs policy about that?
<Mic92> julm: we usually decides this on a per-package base.
<julm> Mic92: ok. I can't tell here. I'm discovering stig
<Mic92> By default we stick to stable, but sometimes we switch to unreleased version if that fixes the build.
<Mic92> or brings in some other important fixes.
<Mic92> julm: As a first step I would open an issue upstream to request a release. Sometimes maintainers have good reasons to hold back a release, i.e. known bugs.
<julm> indeed
<julm> I'll do that
<{^_^}> rndusr/stig#133 (by ju1m, 16 seconds ago, open): A new release?
<adisbladis> makefu: I was considering getting a tshirt printed with that "wonderful" error message
teto has joined #nixos-dev
Jackneill has quit [Ping timeout: 256 seconds]
<arianvp> Ok we got outstanding PRs for all the ACME things that were blocking 20.03
<arianvp> \o/
Jackneill has joined #nixos-dev
<worldofpeace> arianvp: ☹︎ and just went I thought it was very ready
<worldofpeace> * when
<worldofpeace> arianvp: this right https://github.com/NixOS/nixpkgs/issues/84633 ?
<{^_^}> #84633 (by immae, 1 week ago, open): Several issues in ACME certificates
<arianvp> Yes. Correct.
<arianvp> I'm trying to get everything reviewed / into shape today
<arianvp> The last two issues in that list aren't blockers per se. But would be neat of if they get backported. The others should go in before the release (cert renewals being broken is not good :p)
* srk progressing with broken stuff
<worldofpeace> arianvp: I'm having a hard time looking at the issue due to the intersection of checkboxes/issues/prs/collected issues into one big issue. Which tickets are blockers for acme?
<arianvp> Yep let me summarise for you
<worldofpeace> (It would probably help everyone if we could give a date in the release timeline 🤣)
<arianvp> It's only the 48th of march
<{^_^}> #85333 (by arianvp, 15 hours ago, open): [20.03] Revert "nixos/acme: Fix allowKeysForGroup not applying immediately"
<arianvp> This one really needs in. As cert renewal is actually broken
<worldofpeace> arianvp: I don't think anything else needs to be done for those PRs? Needs my merge?
<arianvp> Correct
<worldofpeace> done
<arianvp> Then immae also just linked two prs with backports.
<arianvp> (in the acme tracking issue)
chagra_ has joined #nixos-dev
<{^_^}> #85366 (by immae, 2 hours ago, open): Fix postRun in acme certificate being ran at every run
<arianvp> Which is currenrlt being reviewed by the ofborg
<arianvp> Sorry for all this last minuteness. I didn't realise until pretty recently things were still a bit in a broken state :)
<worldofpeace> arianvp: it was things tests couldn't catch right?
<worldofpeace> (well, that tests didn't)
<arianvp> yes. testing cert renewal is kinda hard unless you want tests to sleep for a long time :/
<arianvp> these were all akward corner cases that are a bit hard to integration test indeed
<worldofpeace> arianvp: understandable, I guess sometimes we can't always catch things during beta
<worldofpeace> feel free to spam me on which PR/s needs merged once it's green
<arianvp> especially when the beta is shorter than the renewal period x)
<arianvp> i'll CC you where needed. thanks
<arianvp> what is actually the "rule" about backports after a release?
<arianvp> I guess bug fixes in modules fall under it, but package bumps dont?
<worldofpeace> arianvp: TLDR; bugfixes, security updates/patching. no breaking changes please. though there's some cases where stuff and during the lifespan of the release you have to do it. New packages are fine also, but not like new package infra
<arianvp> so basically "use comic sans"
<arianvp> ... common sense* that was a very interesting brainfart
<sphalerite> arianvp: brb submitting to bash.org
<sphalerite> spacekookie: ping rfc meeting?
<worldofpeace> sphalerite: she's coming
<spacekookie> Are the shepherds for RFC 64 written down somewhere? They're not marked in the actualy rfc doc
<Mic92> spacekookie: @alyssais @asymmetric @domenkozar @jtojnar with @domenkozar as shepherd leader.
<spacekookie> Ah thanks!
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-dev
chagra_ has quit [Ping timeout: 264 seconds]
chagra_ has joined #nixos-dev
<pie_[bnc]> has anyone suggested having .override natively for lambdas?
<LnL> I think that would mean nothing can be garbage collected
<LnL> which would be rather bad for memory usage
<infinisil> Also, not all lambda's return attribute sets
<srk> timokau[m]: reading your posts - I now have a setBreakage function that can adjust meta of nix files and add/remove/replace broken attr. Now trying to evaluate all attrs similar to what hydra does - will use that to grab outPath(s) and query cache to see if we have a build
<srk> what's not clear is how outPath is constructed and why it defaults to e.g. -bin when there are multiple outputs
<srk> > (import <nixpkgs/nixos/release-combined.nix> { supportedSystems = [ "aarch64-linux" ]; }).nixpkgs.gogs.aarch64-linux.outPath
<{^_^}> "/nix/store/8q3qqxh0n78xj04c97w7l8gz7a6mqbjl-gogs-0.11.91-bin"
<srk> > (import <nixpkgs/nixos/release-combined.nix> { supportedSystems = [ "aarch64-linux" ]; }).nixpkgs.gogs.aarch64-linux.outputs
<{^_^}> [ "bin" "out" "data" ]
<srk> (it is probably not important for this use-case but I'm curious now)
<srk> ah there's also out.outPath
cole-h has joined #nixos-dev
<timokau[m]> srk: I'm glad you're making progress on this!
<timokau[m]> Evaling locally and then querying hydra seems like a weird compromise solution. Why not just either (1) rebuild yourself or (2) scrape hydra directly?
<timokau[m]> Or reuse whatever worldofpeace was doing for 20.03
<srk> not enough capacity, hydra api is difficult
<timokau[m]> (which is probably scraping hydra)
<srk> (querying cache.nixos.org)
<srk> it could use a bulk query tho!
<timokau[m]> Regarding the hydra API I assume those scripts already exist (again, since worldofpeace has already been doing this)
<timokau[m]> Of course the downside of the hydra approach in general is that it will be slightly outdated, but I think that's okay (especially for an initial version)
<timokau[m]> Either way, srk++
<{^_^}> srk's karma got increased to 7
<cole-h> I thought wop was checking Hydra manually
<timokau[m]> I don't think so, since the ZHF posts always have a listing of remaining failures as well right?
<srk> like you could scrape the html page which has aggregated builds listed
<cole-h> ZHF just links to the Hydra jobset
<timokau[m]> Oh right. It only lists the numbers, not the packages. I wasn't really involved in the 20.03 ZHF
<timokau[m]> There was more info for 19.09 though: https://github.com/NixOS/nixpkgs/issues/68361
<{^_^}> #68361 (by disassembler, 31 weeks ago, closed): Zero Hydra Failures: 19.09 edition
<timokau[m]> Looks good
<timokau[m]> Figuring out a way to query the hydra API would also be a great way to work around the lack of email notifications. We could open an issue for every package and ping maintainers. In fact, that would probably be better than email notifications. Automatically takes care of de-duplication, can be triaged, place for discussion.
<srk> right
<srk> I've considered using existing API but it might be way more expensive due to number of required queries
ixxie has joined #nixos-dev
<gchristensen> timokau[m], srk: hold that thought before you go that route
<gchristensen> let me finish my meetings for today and get back o you
<srk> np, I'm not going that route :)
<cole-h> What about transient failures, though? E.g. llvm times out building, so every other packages fails to build
<srk> that needs state and tracking package state over time I think
<timokau[m]> It should only cover direct failures, not dependency failures. And if the failure is direct, I'd argue a transient failure is juts as much a bug as a non-transient failure (maybe even more so)
<MichaelRaskin> Well, there are transient failures that might be Hydra issues
<cole-h> My point ^
<MichaelRaskin> Does ENOSPC ever happen these days?
<cole-h> (Or related to)
<timokau[m]> I would not bother with checking if something has since been un-broken, I'd rely on maintainers to do that. If they are not available to do that, just mark it as broken.
<timokau[m]> It doesn't have to be perfect as long as there's a human in the loop. For every direct hydra build failure, it could open a PR that marks the package as broken and pings the maintainer(s). If the maintainers are unresponsive, someone could just merge those PRs after a day or so. That should cover most cases and that someone would likely notice that llvm shouldn't be marked as broken.
<timokau[m]> gchristensen: Great! Its likely I won't be immediately available though, its too distracting for me to have synchronous communication open all the time. So I'm only available on IRC whenever I actively check it.
<gchristensen> ENOSPC is detected as a transient failure, but still we probably would not want to mark it broken after a single sampling, but over a duration
<gchristensen> so this should not be a substitute for the emails that hydra used to send, but a long-term indicator
<gchristensen> for example, track failures over time - if a package is failing for 7 days (14 days (21 days)) send the PR to disable it
<timokau[m]> Why? Its cheap to revert and an additional motivation to fix transient breakge
<ekleog> about the emails hydra used to send… why are them not sent any longer?
<gchristensen> because things just fail sometimes
<MichaelRaskin> Are you sure you build up enough goodwill by the time of a _single_ undetected Hydra-side failure?
<MichaelRaskin> Turned out, emails did not
<gchristensen> MichaelRaskin: lol definitely not, definitely want to see a time period of multiple build attemps failing
<gchristensen> anyway
<timokau[m]> I think that problem really is not that big if you only notify immediate breakage. If there's some hydra issue, some root package will be flagged false-positive and that's it.
<MichaelRaskin> ekleog: a single hydra-wide miss, Hydra spams every maintainer, supermajority of people say «could this please be stopped _now_»
<timokau[m]> Multiple attempts would also be a great improvement, so whatever gets implemented is fine
<MichaelRaskin> Nobody ever got around to reenabling it with convincing safeguards
<gchristensen> we can easily export build events from hydra as a continuous event stream and let people consume it for whatever they're wanting to do with the data
<MichaelRaskin> timokau[m]: 100% of builds in 6 hour time slot time out
<ekleog> MichaelRaskin: it happened only like once a year at the very most, was it really that bad?
<MichaelRaskin> If nothing root enough changes…
<MichaelRaskin> ekleog: once is enough
<gchristensen> we got enough abuse reports just for inviting people to the maintainer team thre etimes
<MichaelRaskin> Because there are enough people who get _zero_ value from the broken-marking
<timokau[m]> MichaelRaskin: please elaborate
<ekleog> well, these people should just not be in the maintainer list, should they? or maybe just add a flag “send me emails on breakage”?
<gchristensen> *at any rate* the important thing is that the signal be *high quality*. a single failing instance is very low quality. waiting for it to become a high quality signal is important, or it will be simply noise
<gchristensen> I'm going to stop participating in this discussion for now, let me know when it is back to PRing metadata brokenness
<MichaelRaskin> ekleog: I will actually merge a PR that drops me in all the packages I maintain
<srk> instead of emails we could have a self-service where you could enter your maintainer handle and it will tell you breakage stats
<gchristensen> we have that
<srk> hydra?
<timokau[m]> As I said, the details can be figured out after we have something in place. I'm happy with being very conservative as a start.
<timokau[m]> Saying we have that in hydra reminds me of the beginning of "hitchikers guide through the galaxy". Should've read that advance notice :D
<srk> I've used a script posted by infinisil which filterAttrs and tryEval(s) your packages
<srk> that is pretty nice, infinisil++
<{^_^}> infinisil's karma got increased to 255
<timokau[m]> But yeah I agree with gchristensen that this is not a very productive discussion to be had on IRC. If someone is willing to do the work, the best plan would probably to be as conservative as possible in the beginning and then do some proper asynchornous & long-form discussion on future steps.
<infinisil> :D
<ekleog> MichaelRaskin: I'm not sure we're speaking of the same thing: I'm definitely not saying we can have catastrophic hydra failures like last time where I started receiving emails for packages that had nothing to do with me, just that stdenv breaking triggering a dependency fail should not be an issue. is it?
<ekleog> (something like https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages should hopefully be simple enough to set and protect enough against catastrophic failures like the one just before hydra's emails got disabled)
drakonis_ has quit [Read error: Connection reset by peer]
drakonis_ has joined #nixos-dev
<MichaelRaskin> I think we do have some deep updates that end up reverted with «oops, should have done it together with another one or rev-deps fail»?
ris has joined #nixos-dev
<MichaelRaskin> So rate-limiting would need to be quite fine-tuned
<ekleog> well… if we re-enable sending emails, maybe people will be more careful? also, now that ofborg runs the passthru.tests, it'd be possible to add to a lot of deep libraries tests that are just things that depend on them, which should be enough to test that rev-deps still build
<gchristensen> the problem is not just solvable like that
<gchristensen> as maintainer of gocd-agent, I don't want to hear about it when gcc is broken
<MichaelRaskin> «people will be more careful» — never ends well unless there is a _pre-check_ added. not a post-check
<gchristensen> that model worked great when the contributor count was like 50 people hacking on a package set of a couple thousand
<ekleog> (continuation of my previous message, not yet taking into account replies: but such a deep update triggering enough messages to go over rate-limiting postponing future notifications would be OK with me)
<ekleog> Now, the more we speak about it, the more I'm convinced we should just stop having gcc broken. It should basically never happen. Has anyone considered integrating something like bors to nixpkgs?
<MichaelRaskin> I think you ascribe a very high value to all the broken things being marked bbroken, maybe even with an over-approximation
<gchristensen> yes of course
<ekleog> MichaelRaskin: I actually personally don't care about broken things being marked as broken, I care about knowing when some package I maintain, that I maybe have even forgotten I maintain, gets broken by a dependency update :)
<gchristensen> it ins't so complicated as "don't have gcc be broken"
<MichaelRaskin> ekleog: I dunno how far we are from centralised nixpkgs-review of every PR (throughput-wise)
<gchristensen> s/complicated/simple/
drakonis_ has quit [Read error: Connection reset by peer]
<ekleog> MichaelRaskin: well, something like bors would allow us to just nixpkgs-review by batch of PRs
<ekleog> gchristensen: I'm not sure I get what you mean -- if we manage to setup bors (which does sound probably complex), then we could just gate merge on a few core packages building?
<srk> this was discussed yesterday and a day ago as well but I can't keep up with all stuff posted here
<gchristensen> we do, by virtue of staging, staging-next, and the rebuild indicators
<gchristensen> have you ever seen a computer do the wrong thing and then the right thing?
<MichaelRaskin> I have seen changes which (according to one set of people) unbreak ~100 packages being critically broken while keeping the same at the same quality level, and by other account _break_ ~100 packages …
<gchristensen> and what do the stories of fault tolerance tell us about handling failure
drakonis1 has joined #nixos-dev
<MichaelRaskin> I _hope_ Hydra is not an especially prolific source of shades-of-gray-failures
<gchristensen> prolific? no, but the failures do happen, and so we cannot say "have you considered making gcc not be broken?"
<MichaelRaskin> But one is enough for many people to get annoyed… And then it becomes «does it really provide enough value to match annoyance»
<ekleog> gchristensen: staging only works for commits that are actually made into staging -- I would hope mass-failure commits don't come from staging merges, or that'd be very sad
<MichaelRaskin> ekleog: maybe we should just put a link to the search-Hydra-by-maintainer page at a prominent place?
<gchristensen> ekleog: the code does not have to be broken for mass failures to happen
<gchristensen> is what I'm saying
<ekleog> the idea of bors would be to gate all code merge by it
<ekleog> MichaelRaskin: if there's an RSS, it's the same as emails to me
<ekleog> gchristensen: I'm not sure I understand, sorry. If we can gate every merge on a “core” of nixpkgs building, is it not enough to be able to say that the remaining failures should be notified to the maintainers?
<gchristensen> sorry, failures just aren't so clear cut
<ekleog> (actually, even removing DependencyFailed messages and keeping only Failed messages would make sense too, most likely, for most cases?)
<gchristensen> sigh
<gchristensen> it may build fine in PR, and it may buildfine in staging, and may build fine in staging-next, and then merge to master and then fail because a builder was in the way of some angry photons and then we've pissed off contributors by sending a million emails
<gchristensen> s/photons/whatever/
<ekleog> does that actually happen in practice? (also, the last million emails we sent were far more than just the emails for the packages people maintained, which -- at least for me -- was the actual issue)
<gchristensen> YES.
<ekleog> ugh :(
<gchristensen> we have to build systems which work in the face of things going not great. we can't annoy contributors assuming everything is great
<MichaelRaskin> ekleog: if it was just the emails for packages people maintained sent one-by-one, it would still be pitchforks time.
<ekleog> Well… yes and no? *someone* getting annoyed when something like this happens does sound like a good thing -- *everyone* getting annoyed is the issue, then
<gchristensen> hydra builds a half a million jobs a week -- you can't just wish away mysterious prbolems
<ekleog> MichaelRaskin: I'm more and more thinking the RSS idea is the solution
<ekleog> there's built-in rate-limiting, it's opt-in
<MichaelRaskin> Yep, RSS idea is cool
<MichaelRaskin> And it's actually targeted…
<gchristensen> it is targeted?
<MichaelRaskin> Unlike PRs!
<MichaelRaskin> I skim the headlines of all PRs open
<gchristensen> I'm looking at ~1 build event per second
<MichaelRaskin> I sometimes find something human-submitted I am interested in where nobody is auto-mentioned and so one
<MichaelRaskin> A dump of 500 mark-broken _PRs_ at once does not sound nice
<michaelpj> or a weekly digest or something? "Current state of packages you maintain, including delta from last week"
<michaelpj> I'd read that
<ekleog> that'd certainly be the best
<ekleog> but probably quite hard to implement
<gchristensen> emails, or digests, or RSS or whatever is fine, as long as the signal is high quality. that the notification means you have to *DO* something and it is actually *true* in 999/1,000 cases
<gchristensen> the problem is *signal* the signal has to be so high that you can't ignore it
<gchristensen> the way it arrives doesn't matter
<srk> +1
<timokau[m]> I would have more to say on the matter, but everyones positions seem pretty set in stone so I'll stop myself :) I'm out for now
<gchristensen> opening PRs to mark transient failures as broken -> those PRs are just ignored
<gchristensen> sending digests/email/rss without it being a guaranteed failure -> "eh, it'll probably fix itself."
<globin> actually per-maintainer rss of broken builds would really be nice similar to the repology per-maintainer rss
<ekleog> aren't Failed (without DependencyFailed) hydra emails not enough in this case? I'd get notified iff the package I maintain failed building, and if once in a blue moon I get notified for a spurious breakage because of quantum event happening… well, it had to fall on someone, and it was me?
<gchristensen> this is all very well documented in the medical field and IT field
<globin> for me even if there were false negatives
<gchristensen> thinsg that get your attention need to be specific, actionable, critical or they are going to the garbage bin of your email or at least mind
<niksnut> yes, build failure notifications emails just become noise
<niksnut> they end up in a folder where you never look at them
<gchristensen> so many aviation disasters, chemical disasters, medical disasters began at a good alert ignored because the alerts were too numerous
<niksnut> source: the "Hydra messages" folder from my previous job has 30K emails in it
<ekleog> well, good alerts ignored because there are too many alerts is definitely an issue… but is it not also the case that no alerts are basically just as bad?
<michaelpj> controversial suggestion: maybe "broken" metadata shouldn't be in the nixpkgs source? What if instead of that you instead got "This package has not succesfully built on Hydra, you may need to build it locally and it may be broken. Proceed [Y/N]" based on the state of the binary cache. That would remove the need for the marking broken/unbroken dance
<gchristensen> 3 mile island happened because a good alert was ignored
<gchristensen> ekleog: no. too many alerts is *worse* than no alerts, because every bad alert trains people to ignore alerts. you can't easily undo that training.
<michaelpj> (that's orthogonal to the "get maintainers to actually fix things" problem, though)
<ekleog> michaelpj: I'd probably agree with you, “broken” metadata being in the nixpkgs source sounds weird to me too
<gchristensen> ekleog: nobody does retrospectives on "why did I get too many alarms?". "Why didn't I get an alert?" -> now THAT is good retrospective material!
<ma27[m]> so I just read the beginning of your conversation, so I'm not sure if that's even relevant, but as proposed on last nixcon I built a draft for maintainer-search for broken builds in a jobset for hydra which may help to find broken stuff that's relevant for a maintainer: https://github.com/NixOS/hydra/pull/685 (can rebase onto master and improve the code at some point)
<{^_^}> hydra#685 (by Ma27, 23 weeks ago, open): Implement maintainer search
<gchristensen> this is the same reason ofborg *never* marks a PR with a red "X" when a package fails to build. the signal is too weak. too many things could go wrong. a red "X" is stop-the-world-until-its-fixe
<cole-h> Red X good: no maintainer entry; dependency not in nixpkgs yet; missing semicolon. Red X bad: oopsie, quantum particles flipped this one bit causing the entire world to burn
<ekleog> gchristensen: That would be true if we actually had retrospectives. Right now, we have nothing, and nothing is happening. As you make the comparison with ofborg, I have to follow up on it: it never marking a PR with an X means I always have to open the tab to see if ofborg went bad. And there's no real gain (for me) in this, because I'm just not going to land anyway if the build doesn't pass --
<ekleog> maybe if it looks weird to me I'll just restart it to see if it passes the second time, but I'm not going to land something that ofborg is unable to build, even if I don't know the reason why
<gchristensen> we do actuall
<MichaelRaskin> I know why! Timeout
<MichaelRaskin> (another case is dependency that is unsupported on macOS)
<ekleog> (like, a quantum bit flipping in the state of the evaluator could happen just as well, what's so different? where to place the limit between “this random failure is acceptable” and “this one is not”?)
<gchristensen> ekleog: with a can-be-ignored red X, you have to see why ofborg went bad anyway
<gchristensen> or worse, just ignore it and merge a known-broken PR
<gchristensen> and having ignorable red X's means that is more likely to happen
<ekleog> ignorable red X's should definitely not happen
<gchristensen> exactly. now apply that same idea to alerts
<ekleog> what I say is, to me ofborg failure, even if by random bitflip, should be non-ignorable red X
<gchristensen> I agree!
<ekleog> becuase the gain/cost ratio is high enough for that, even if it sometimes means I have to `@ofborg build [thepackage]` a second time
<gchristensen> first we need to make every package always build
<MichaelRaskin> You do understand that gains here are partially subjective?
<gchristensen> (but this is about alerts)
<ekleog> hmm, question: does nixpkgs-review tells you “this reverse-dep is broken but was already broken before”? (just recently heard it was gaining popularity, haven't tried it out yet)
<ekleog> actually, no, it doesn't work, my idea would place the burden of fixing all reverse-deps on the person doing the change, and there was consensus that this shouldn't happen for core changes last time it was discussed, IIRC
<ekleog> MichaelRaskin: to me, the cost is subjective, the gain can be measured in terms of nixpkgs quality
{o-O} has joined #nixos-dev
<srk> #cached /nix/store/8xwq3pbmqmpcvms7zsgmcbhy87ynxb6n-gogs-0.11.91
<{o-O}> Cached
<MichaelRaskin> Back to square one: I am pretty sure from observation that different users have _conflicting_ evaluations of Nixpkgs quality
<gchristensen> uh oh
<ekleog> gchristensen: to me, getting a random alert when a random bitflip occurred during the build of my specific package, assuming that all dependencies built correctly, is a reasonable cost for the gain of the notification
abathur has joined #nixos-dev
<gchristensen> that is not a reasonable cost, and science and research says so. sorry.
<ekleog> (though we may have to give all maintainers the right to restart their hydra jobs to be consistent in this matter)
<ekleog> gchristensen: I really don't get it, sorry. It should really be less than 1% false positives. Which science and research says that less than 1% false positives is too much?
<cole-h> should be != is
<MichaelRaskin> Less than 1% of _what_
<MichaelRaskin> It is safely <1% of builds turning out as false positives
<MichaelRaskin> But that doesn't mean anything
<MichaelRaskin> And for false positives being less than 1% of all _alerts_, could you please tell the intermediate estimations
<MichaelRaskin> (i.e. how many builds are true positives and how many are false positives)
<ekleog> Well, the first estimation is: based on what I understand from some people on #btrfs, it looks like machines can run for weeks while fuzzing their disk without having an error that's not due to a bug. So let's say one random failure per machine per week is a reasonable estimate. I'd be surprised if we had less than 100 failing builds per week on hydra
ixxie has quit [Ping timeout: 240 seconds]
<ekleog> (now, I may have misunderstood, but these are the best numbers I have)
<srk> ECC enabled machines or just machines?
<samueldr> sphalerite: my schedule is pretty open
<ekleog> most likely ECC-enabled, and I hope hydra's machine are too
ixxie has joined #nixos-dev
<srk> most probably yes since it's datacenter hw / vms
drakonis has joined #nixos-dev
drakonis1 has quit [Ping timeout: 250 seconds]
<srk> > (import <nixpkgs/nixos/release-combined.nix> { supportedSystems = [ "x86_64-linux" ]; }).nixpkgs.lbzip2.x86_64-linux.outPath
<{^_^}> "/nix/store/0v4px5vn8zjp7zkyg7f0wl6bfa1b1hw8-lbzip2-2.5"
<srk> #cached /nix/store/0v4px5vn8zjp7zkyg7f0wl6bfa1b1hw8-lbzip2-2.5
<{o-O}> Cached
<srk> #cached /nix/store/7v0yf023pj1cfmrfssf7pizhmf4c71v9-lbzip2-2.5
<{o-O}> Not cached
drakonis1 has joined #nixos-dev
<tilpner> #cached /nix/store/0v4px5vn8zjp7zkyg7f0wl6bfa1b1hw8-lbzip2-3.14
<{o-O}> Cached
<srk> tilpner testing the new path parser, like it :)
<tilpner> Well, srkOfBorg gave no search results
<tilpner> I just wanted to see if I could make the error spill into a second line
<infinisil> srk: Hehe, nice bot
<srk> tilpner: it's hnix-store PR https://github.com/haskell-nix/hnix-store/pull/59/files#diff-97933595aa3fffcd42728f70f8858d6aR158-R174 and bit of parsec for bot, not public (yet, will fix soonish)
<srk> infinisil: just a PoC, will teach it few things ;)
<sphalerite> samueldr: Ericson2314: alrighty, so now we just need to get a hold of dezgeg :)
<lovesegfault> adisbladis: thanks a lot for https://github.com/NixOS/nixops/pull/1283
<{^_^}> nixops#1283 (by adisbladis, 1 day ago, open): Add SSH jump host support
<lovesegfault> This means I can now use NixOps for all my systems
<lovesegfault> also thanks gchristensen for bringing the project back from the dead
<gchristensen> to be honest it was the response I got when I started adding types that brought it back from the dead, it couldn't have happened without the people helping.
<gchristensen> and then I tricked adisbladis in to caring :P
<globin> sphalerite: rfc#32?
lopsided98 has quit [Remote host closed the connection]
<globin> I said i'd take over if necessary, that's relevant for structured-attrs and another thing I'm working on
lopsided98 has joined #nixos-dev
<adisbladis> gchristensen: It's been an excellent rabbit hole :)
<gchristensen> :D
evanjs has quit [Quit: ZNC 1.7.5 - https://znc.in]
<infinisil> rfcs#32
<{^_^}> https://github.com/NixOS/rfcs/pull/32 (by dezgeg, 1 year ago, open): [RFC 0032] Phase running changes for better nix-shell use
evanjs has joined #nixos-dev
justanotheruser has joined #nixos-dev
<cole-h> adisbladis++ I don't use nixops, but the fact that you added privilegeEscalationCommand as an option makes me happy, as a doas user myself.
<{^_^}> adisbladis's karma got increased to 48
taj has joined #nixos-dev
taj has quit [Remote host closed the connection]
<gchristensen> I just want to reiterate a deep interest in having as many people as are interested participate in developing and adminstrating ofborg
abathur has quit [Quit: abathur]
<julm> ofborg is ~11000 lines of Rust
<srk> lots of that is tests
<srk> which is good for this use-case!
<gchristensen> I mean
<gchristensen> julm: is there a problem with that?
<gchristensen> feels pretty thankless today :(
<srk> it does the job and does it very well and that is important
<julm> gchristensen: nope, I'm just taking time to git clone ofborg and that's the first thing I was curious about
* julm doesn't know Rust at all
<gchristensen> I didn't either :0
* srk neither
<julm> gchristensen: do you enjoy it now? is it good for this task?
<gchristensen> I do like rust quite a lot
<julm> :)
<gchristensen> I would have done things differently in ofborg if I had known rust before I started
<julm> gchristensen: you started it in PHP, right?
<gchristensen> yeah, but only to get a prototype done very very quickly, and in a language I would absolutely be unwilling to continue using
<gchristensen> if I used a language which had been remotely suitable it would never have been rewritten
<gchristensen> (I say this as someone with respect and apprecition for PHP)
<julm> 'k
<sphalerite> TIL: always use a terribly unsuitable language for PoCs so that you're forced to throw them away
FRidh has quit [Ping timeout: 256 seconds]
<julm> gchristensen: where does the name OfBorg come from?
<samueldr> sphalerite: why do you think everything starts as bash?
<gchristensen> you know how the borg is an entity all acting as one?
<samueldr> star trek nerdery afoot, julm :)
<julm> niarf
<gchristensen> and then locutus spoke for the group
<gchristensen> ofborg ran on people's laptops and desktops and whatever arbitrary system people had available
FRidh has joined #nixos-dev
<gchristensen> hopefully that explains it
<julm> gchristensen: so.. you run a Star Trek OfBorg on Futurama servers. Sure. Makes sense :p
<gchristensen> lol
<samueldr> is the next experiment a lot of small processes doing music named midichlorians?
<gchristensen> hah
<MichaelRaskin> We learned the hard way that the plural in people's desktops was not _always_ appropriate!
<gchristensen> hah
<gchristensen> yeah, it turns out people don't like that model very much, even though people like the idea of it a lot
<MichaelRaskin> To be fair, when the issue was discovered, it was resolved pretty quickly…
<gchristensen> I think about all the damage it caused ofborg, having to keep the protocol BC because people were slow to update
<MichaelRaskin> But it has the chicken-and-egg problem: we want spare capacity before advising to build more, and if the capacity is sufficient with a safety factor of two there is little incentive to learn the connection procedure
<MichaelRaskin> So you want to say that having only one macOS person and two of us Linux people was an improvement?
<gchristensen> yes
<MichaelRaskin> Kind of sad for the model!
<gchristensen> it turns out being able to move quickly is a big feauter
<gchristensen> and distributed processing where you can't just turn people off for not updating right away, without losing a significant amount of the network, is a bit of a tarpit
<MichaelRaskin> But yeah, I guess doing a step forward requires some kind of kick-booting. «I promise to learn how the hell to set up an auto-updatable builder if 10 more people promise to do the same and we can offer everyone larger timeouts»
<MichaelRaskin> Otherwise you get an equilibrium where three people with spare-capacity beefy machines cover the current needs … and welcome to another kind of tarpit
<Irenes[m]> hmm, so like, `nix-build` propagates error messages from the build tools, in a way that `nix build` does not, even with `-vvvvv`.
<gchristensen> so many tarpits
<gchristensen> Irenes[m]: what if you add -L
<Irenes[m]> that is a useful ability to have and it seems like `nix build` should add it as a feature
<Irenes[m]> oh sorry, didn't mean to interrupt
<gchristensen> no worries :) just a reflection
<Irenes[m]> hmm let me try
<MichaelRaskin> Also, you can nix-store -l on the derivation adter the build (if you want this)
<gchristensen> (`nix build` isn't stnable, Irenes[m])
<julm> nix build is not stable??
<gchristensen> the `nix` command is not stable interface
<julm> hmm
<julm> good to know
<Irenes[m]> yes, -L works
<Irenes[m]> I get that it's not stable. I was just trying it out because I wanted to understand where the pain points were, in case there were any that were worth flagging to be addressed :)
<aanderse> random proposal: new meta "people-you-need-to-ping-to-test-this" field for nixos modules!
<Irenes[m]> good to know about nix-store -l, I had no idea that was there
<aanderse> we have some people in the community who are not maintainers but there testing and feedback on a PR is beyond measurable value
<Irenes[m]> yeah for sure!
<Irenes[m]> oh um one other pain point I should mention btw. if the nix command or its subcommands have manpages, they aren't accessible. "man nix build" gives the manpage for nix-build.
<Irenes[m]> I suspect that's just because they haven't been written and the html manual is the reference for now
<sphalerite> Irenes[m]: correct, there are no manpages for the nix command currently. You have --help and the html manual
<sphalerite> aanderse: like maintainers?
FRidh has quit [Quit: Konversation terminated!]
<aanderse> sphalerite: but not maintainers.. just people who have interesting setups and should be pinged for testing
{o-O} has quit [Remote host closed the connection]
<aanderse> there are some people who don't want the ownership that maintainer implies
<aanderse> not a big deal, though
<aanderse> just an idea
<jtojnar> looks like the php refactoring broke php support in httpd
<emily> aaron: +1 for "users/testers" field
<emily> would potentially help flag up breaking changes for heavy users of modules too
<emily> relatedly: does anyone have the GitHub perms to add @NixOS/... groups? see https://github.com/NixOS/nixpkgs/pull/83474#issuecomment-614703535
<gchristensen> you want a group for acme?
<aanderse> generalized web stack group?
<emily> gchristensen: I've written/received in my inbox "cc @arianvp @immae @emilazy @m1cr0man @aanderse @flokli" about a billion times the past few days so I think it would save people typing at least :)
<emily> (would be nice if those could automatically be mapped from nixpkgs maintainer groups somehow but I guess that's not possible?)
<gchristensen> anything is possible :P
<gchristensen> esp. with rfc39 doing almost all the work already
<emily> right, makes sense
<emily> guess should start with a PR to add a maintainers group then
<gchristensen> yeah
<aanderse> well if it's just for the single module then we add users to the maintainers list of the module
<gchristensen> I forget, did we imagine a way to identify changed modules, and then maintainers of that module?
<emily> I think it makes sense to use for the lego package too
<emily> maybe pebble as well (it's used exclusively in the test and has maintainers = [] currently)
ixxie has quit [Ping timeout: 260 seconds]
ixxie has joined #nixos-dev
<Irenes[m]> sphalerite: thanks for the confirmation
<arianvp> I think there is a maintainer meta thing for modules yes
<gchristensen> there is yeah
<arianvp> But it's not used as rigioursly. Lot of modules are orphans
<gchristensen> right
<gchristensen> my question is given a diff, do we have a way to identify the maintainers of modules which were changed
<srk> git blame?
<infinisil> > nixos.meta.maintainers
<{^_^}> attribute 'meta' missing, at (string):296:1
<infinisil> > nixos.config.meta.maintainers
<{^_^}> { "/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix" = <CODE>; "/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/vte.nix" = <CODE>; "/var/lib/nixbot/nixpkgs/master/repo/nixo...
<gchristensen> > nixos.config.meta.maintainers."/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix"
<{^_^}> [ <CODE> ]
<gchristensen> > nixos.config.meta.maintainers."/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix".0
<{^_^}> attempt to call something which is not a function but a list, at (string):296:1
<gchristensen> > :p nixos.config.meta.maintainers."/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix"
<{^_^}> [ { email = "joachifm@fastmail.fm"; github = "joachifm"; githubId = 41977; name = "Joachim Fasting"; } ]
<gchristensen> > :v nixos.config.meta.maintainers."/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix"
<{^_^}> nixos.config.meta.maintainers."/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix" = nixos.config.meta.maintainers."/var/lib/nixbot/nixpkgs/master/repo/nixos/modules/config/malloc.nix" is not defined
<gchristensen> nice
<gchristensen> infinisil: pretty easy
<infinisil> Yea
<infinisil> git diff to find changed files and look them up there
<gchristensen> infinisil: remind me of that when I'm not brain-mush?
<infinisil> Sure :)
<infinisil> git diff --name-only :o
<gchristensen> yeah a lot of options in git diff and git log
<domenkozar[m]> matthewbauer++
<{^_^}> matthewbauer's karma got increased to 5
<{^_^}> nix#3506 (by matthewbauer, 8 minutes ago, open): Add --include-ifd option to nix-instantiate
pkmxtw[m] has joined #nixos-dev
orivej has quit [Ping timeout: 258 seconds]
orivej has joined #nixos-dev
<infinisil> Ohh nice, matthewbauer++
<{^_^}> matthewbauer's karma got increased to 6
<infinisil> I believe this could then be used to prevent IFD'd derivations from being GC'd
<{^_^}> nix#719 (by wizeman, 4 years ago, open): Nix GC collects derivations used for IFD
<Ericson2314> sphalerite: that might be more difficult!
chagra_ has quit [Ping timeout: 256 seconds]
orivej has quit [Ping timeout: 258 seconds]
chagra_ has joined #nixos-dev
__monty__ has quit [Quit: leaving]
ixxie has quit [Ping timeout: 260 seconds]
phreedom has quit [Ping timeout: 240 seconds]
phreedom has joined #nixos-dev
abathur has joined #nixos-dev
<mkg20001> Anyone mind taking a look at https://github.com/NixOS/nixpkgs/pull/83911 ?
<{^_^}> #83911 (by mkg20001, 2 weeks ago, open): stage-1-init: add boot.persistence option
drakonis_ has joined #nixos-dev
drakonis has quit [Ping timeout: 256 seconds]
drakonis_ has quit [Ping timeout: 246 seconds]
drakonis has joined #nixos-dev
abathur has quit [Ping timeout: 256 seconds]
abathur has joined #nixos-dev
drakonis has quit [Read error: Connection reset by peer]
drakonis has joined #nixos-dev
drakonis_ has joined #nixos-dev
drakonis has quit [Ping timeout: 250 seconds]