gchristensen changed the topic of #nixos-borg to: https://www.patreon.com/ofborg https://monitoring.nix.ci/dashboard/db/ofborg?refresh=10s&orgId=1&from=now-1h&to=now "I get to skip reviewing the PHP code and just wait until it is rewritten in something sane, like POSIX shell. || https://logs.nix.samueldr.com/nixos-borg
orivej has joined #nixos-borg
cole-h has quit [Quit: Goodbye]
hmpffff has joined #nixos-borg
orivej has quit [Ping timeout: 260 seconds]
orivej has joined #nixos-borg
cole-h has joined #nixos-borg
<LnL> gchristensen: re: spring cleaning
<gchristensen> hi :)
<LnL> do you think it would be ok to query hydra directly for that or probably not?
<gchristensen> I'd rather not
<gchristensen> this would be a good use case for that database export, or for being able to subscribe to build notifications
<LnL> yeah figured it's not a great idea, especially since it should check multiple historical builds
<LnL> do you have an idea on how to expose / access that?
<gchristensen> the events?
<gchristensen> or the database dump
<LnL> I think the database would probably be better for this
<gchristensen> so I actually have the data already
<gchristensen> it gets backed up to my machine every 5 minutes
<LnL> the safest first step I'm thinking of is checking if any build in the last x time has succeeded
<LnL> which is kind of hard to do with events
<gchristensen> yeah
<gchristensen> but you could have a timeseries database
<LnL> notifications of breakages is a different thing
<gchristensen> the thing I want to do next is load the database and do a .sql dump on a daily or weekly basis. this would validate the backup was good. secondary effect is letting other people access the .sql dump for queries like this :)
<gchristensen> (by events I mean like a rabbitmq or 0mq or whatever mechanism of publishing parsable event data)
<LnL> right
<gchristensen> I suppose if I'm loading and dumping the sql, it would not be a far throw to be able to have a list of queries executed and publish those query results too
<LnL> ah, so you're thinking more of publishing events periodically going over all packages?
<gchristensen> sorry, 2 different ideas :)
<gchristensen> the build event data is just purely: here is a firehose of events, do whatever you want, anybody can subscribe. good luck and god speed.
<gchristensen> the second idea is if we do it as a batch operation at the same time as validating the database backup
<gchristensen> since I'm loading the database and doing a pgdump anyway, might as well at the same time execute some list of queries and publish their results in the same place the pgdump data goes
<LnL> makes sense
<LnL> so what you'd want is a small deamon that just runs a bunch of queries over jobs, etc. and publishes events for those?
<gchristensen> ah, for sort of publishing I mostly mean just like, `aws s3 cp` to a bucket :) not really an event exactly
<LnL> hmm, a bit confused now
<gchristensen> sorry :/
<gchristensen> overloaded words
<LnL> making food first, then I'll make a diagram of what I was thinking
<gchristensen> cool
<srk> have you seen fedmsg infrastructure for Fedora?
* srk likes the idea of sql dumps, front-facing / staging sever for testing queries and so would be even better
<gchristensen> I've heard of fedmsg
<gchristensen> but I don't remember how it works
<gchristensen> we'd need to be able to send a fairly high number of events
<srk> not sure either, it is some messages, maybe even AMQP
<gchristensen> several each second
<srk> fedmsg is ZeroMQ
<gchristensen> I don't quite grok how that works
<srk> pyzmq
<srk> hmm, zmq is like building blocks for queues and co, it handles bunch of low level stuff for you but it's not a fully-fledged message queue by itself
<gchristensen> this is beautiful LnL
<gchristensen> how did you make it?
<LnL> :p
<srk> Postgres Mirror <3
<LnL> but does it make any sense?
<LnL> omnigraffle
<gchristensen> nice, I love omnigraffle. I have a macos VM just for omnigraffle
<gchristensen> LnL: I think this makes sense, but let me suggest a few edits
<srk> so it's AMQP now..
<gchristensen> Hydra sends me a ZFS filesystem diff every 5min, so on my system I'd take the current state of the filesystem, start posgresql, and make a dump from that
<gchristensen> the Selector would then operate on the same postgres server which the dump is made from
<srk> ZFS diff replication! mad
* srk was wondering how could you do that every 5 min
<srk> is there a backup hydra? :D
<gchristensen> nah, hehe
<LnL> right, the details of that don't really matter for the rest of the picture
<gchristensen> but yeah it uses snapshots for backups
<gchristensen> the arrow from Selector to build status is a set of queries, right?
<LnL> yeah
<LnL> for this probably first listing failed builds on trunk and then a query for each of those
<gchristensen> yeah
<LnL> which either results in an event or not
<LnL> or always send an event including the delta, whatever makes more sense
<gchristensen> yeah, so then the output of that could be a stream of "broken-forever" or "broken-recently" messages
<gchristensen> or a bulk blob of JSON containing that "report"
<gchristensen> which are you thinking?
<gchristensen> oh that is what you just said too haha
<LnL> probably an event for each, I bet the queries could be a bit heavy
<gchristensen> yeah
<gchristensen> I like that
<gchristensen> cool, I like this
evanjs has quit [Quit: ZNC 1.7.5 - https://znc.in]
<LnL> long term you might want to make it remember some of the stuff it did so it doesn't start with 0ad every time if it didn't complete a cycle, etc.
<srk> btw I have post-receive hook implemented for watching nixpkgs commits, that could be used as a source instead of webhook. it is a standalone thing for now which passes events to sever which sents them to clients over websocket to web face
<gchristensen> LnL: yeah, that sounds like a future thing we can deal with if we have to :P
<gchristensen> srk: github has post-receive hooks beyond their webhooks?
evanjs has joined #nixos-borg
<srk> gchristensen: no, it works by checking out mirror copy of the repo and fetching periodically then pushing to repo which has post-receive hooks
<srk> cause github doesn't make it easy :)
<gchristensen> ah
<gchristensen> we have the webhook setup on github's end
<srk> sure, but if I wanted to receive a stream? :)
<gchristensen> yeah, so the webhook goes right in to rabbitmq :)
<srk> with this you just run a websocket client
<srk> yup, that works as well. I've stopped relying on github functionality since it makes easy stuff like this difficult
<gchristensen> aye
<gchristensen> sounds like a cool thing you made
<LnL> gchristensen: cool, maybe I'll have a go at it this weekend
<LnL> would you want this as part of ofborg or something separate?
<gchristensen> LnL: that would be amazing!
<gchristensen> I would kind of like to do it as part of ofborg so it can easily reuse all the same infra
<gchristensen> I feel like ofborg is a bit mired by not-very-nice code
<srk> message queues are awesome for this!
<srk> you just need to commit and remove the remanining php files :D
<gchristensen> eh, I think I'll stick to PHP
<LnL> srk: check the channel description
<srk> (that hook forwarder is not that bad !!)
* srk runs
<srk> LnL: noticed ;)
<gchristensen> LnL: the part you'd be working on this weekend would include Selector but exclude Postgres mirror and Importer right?
<LnL> yeah
<gchristensen> cool, perfect
<gchristensen> it is cool that r-ryantm runs nixpkgs-review on its PRs now
<gchristensen> I'm feeling like there is a lot of pressure on me to make ofborg do a lot more all of a sudden
<gchristensen> stressful
<cole-h> tbh I feel like it's better for humans to do it when they review the PR
<cole-h> Because then they can also check functionality at the same time (if successful)
<MichaelRaskin> Or not do it at all (if realistic)
<gchristensen> oh?
<MichaelRaskin> Depends on the reviewer, sure
<MichaelRaskin> I mean, if ofborg ran full review on request, the share of full-review PRs would be way up
<MichaelRaskin> (not saying that this is worth the effort of getting enough build power)
<gchristensen> more important to me is throughput and QoS
<MichaelRaskin> And r-ryantm does have the benefit of being able to control throughput
<gchristensen> yeah exactly :P
<gchristensen> don't want a full review on gcc to gum up the works for everything
<flokli> gchristensen: ofborg is internal-erroring again?
<gchristensen> link?
<{^_^}> #85334 (by flokli, 20 hours ago, open): systemd: 243.7 -> 245
<gchristensen> nice
<gchristensen> it might have to do with me doing an aggressive GC
<flokli> it might be ofborg got confused as I pushed two times in a matter of 2mins or so
<flokli> but you might see something in the logs at least
<flokli> so I decided to ask ;-)
<gchristensen> nope
<gchristensen> github again
<gchristensen> out of all the fields you would think they wouldn't send back `status` records without the `status` field
<cole-h> Again???
<cole-h> ........
<flokli> brrr
<flokli> thanks github ;-)
* cole-h should figure out grafana alerting and send one to me whenever ofborg internal-errors
<gchristensen> oh that'd be cool
<srk> is it prometheus based?
<gchristensen> https://monitoring.nix.ci https://nix.ci/prometheus and there is a (private) loki instance
<srk> cool, thanks
<gchristensen> I can give out access to the loki instance as people want
<gchristensen> I just don't want it to be public-public in case some API key or whatever gets out
<srk> interesting, first time I hear about loki
* srk was using graylog before
<gchristensen> 105802 store paths deleted, 223739.34 MiB freed
<gchristensen> not bad
<srk> pretty
<cole-h> gchristensen: Are private folders a thing in Grafana?
<gchristensen> like personal folders?
<cole-h> Like how Loki is restricted right now
<gchristensen> ah
<gchristensen> no, it is restricted because the Explore tab is restricted
<gchristensen> I don't know about other stuff
<gchristensen> give it a go, try stuff :)
<cole-h> Yeah, I created a dashboard for internal errors
<gchristensen> cool
<gchristensen> link?
<cole-h> But I set the representation to be the logs because I figured that might be more useful than a bargraph with 1 tick
<cole-h> I think that should work
<gchristensen> nice
<cole-h> Ah, but the log view doesn't allow alerts x)
<cole-h> Aaaand "the datasource does not support alerting queries"
<gchristensen> ffs
<gchristensen> pushover won't stop telling me stuff is broken, but the alerts system doesn't say anything is wrong
<samueldr> obviously, something's wrong with the alerts system if there is no alerts /s
<gchristensen> lol
<gchristensen> brb throwing my phone in to a blender
<cole-h> gchristensen: Can you add a data source for prometheus-based Loki? https://youtu.be/GdgX46KwKqo
<gchristensen> the problem seems to be Pushover
<LnL> howso?
<gchristensen> well I deleted the alert and stopped alertmanager
<gchristensen> and the alerts kept coming
<LnL> oh, is that why it stopped?
<gchristensen> stopped?
<gchristensen> mine isn't stopping
<cole-h> Haha
<LnL> I got a last [RESOLVED] one at 20:38
<LnL> you know it has a do not disturb button right?
<LnL> or did that also break
<gchristensen> yeah that is broken too
<gchristensen> ffs
<gchristensen> this is literally killing me
<LnL> oh wow :/
<gchristensen> ditched my phone on top of a pillow in another room so I don't have to hear it buzz every 2 minutes
<LnL> one more thing I didn't think about yet, broken builds won't show up as jobs in hydra
<gchristensen> hm
<LnL> so this also needs some other mechanism to receive/find all the broken packages
<gchristensen> specifically this is `meta.broken = true;` ones yeah?
<LnL> yeah
<gchristensen> hm
<LnL> packages marked as broken still evaluate right? at least enough for meta.available
<gchristensen> I think
<gchristensen> yeah
<gchristensen> ofborg requires even broken packages evaluate completely
<LnL> hmm, it evaluates with allowBroken?
<gchristensen> I htink so
<LnL> indeed, interesting
<MichaelRaskin> So rebuild count includes broken?
<gchristensen> I hadn't considered that
<gchristensen> maybe so
<LnL> ok so I think that should make it possible to expose something like { hello = { broken = false; unfree = false; }; }
<LnL> which can then be used as input for what jobs to query
<gchristensen> cool
<gchristensen> yeah that is a cool idea
<LnL> unfree = skip and not broken but no jobs is unknown
<gchristensen> yeah
<gchristensen> nice sounds great
orivej has quit [Ping timeout: 258 seconds]
orivej has joined #nixos-borg
orivej has quit [Ping timeout: 258 seconds]