<gchristensen>
wanting to read that when I have a second :o
<gchristensen>
disasm: > end of eval <- a way to get the eval *log* too plz
<disasm>
gchristensen: good point
<disasm>
gchristensen: ideally, I think we'd want a UI change so https://hydra.iohk.io/eval/631666 has a tab for log and just link URL to that.
* disasm
wonders where log goes when there's no errors
<samueldr>
the eval log is an UPDATE on the eval's row IIRC
<samueldr>
so failed evals are lost in the æther
<disasm>
ah
<samueldr>
ideally, a failed eval, not with errors, but one that fails due to OOM or anything else should still be listed imo
<disasm>
are the build logs all stored in psql?
<samueldr>
(my knowledge stops about here)
<disasm>
I have clever digging into if RabbitMQ fits the bill here and how hard phase 1 would be to implement. I anticipate early next week we might start on hacking on it, so please post any suggestions to that issue!
<Taneb>
Biggest worry I have is "will it bring in many new large dependencies to Hydra"
<samueldr>
I think the evals thing I said is orthogonal
Drakonis has joined #nixos-dev
Drakonis has quit [Quit: WeeChat 2.4]
<samueldr>
likely OOM for nixos:trunk-combined @ 16:26 UTC
FRidh has quit [Quit: Konversation terminated!]
<gchristensen>
I think introducing rabbitmq opens up a lot of room for building further in to it, expanding hydra's capabilities
<gchristensen>
hydra evaluates tens of thousands of things, something heavier like rabbitmq is proabbly a good fit
<gchristensen>
one concern I'd like to see addressed is monitoring and how to understand if things are healthy or not
<samueldr>
it'd be interesting to see more structured information in evaluations; details about the different kind of warnings, tracking a job status even when it doesn't eval due to being marked broken, or unsupported on platform X
<disasm>
gchristensen: I'm thinking if all the communications are going throgh rabbitmq we could write a prometheus exporter that watches those communications and spits out some metrics
<samueldr>
example of what I'm thinking, "This job is not a member of the latest evaluation of its jobset. This means it was removed or had an evaluation error.", no structured data to point to a reason