sphalerite changed the topic of #nixos-dev to: NixOS Development (#nixos for questions) | NixOS 19.09 now in beta! https://discourse.nixos.org/t/nixos-19-09-feature-freeze/3707 | https://hydra.nixos.org/jobset/nixos/trunk-combined https://channels.nix.gsc.io/graph.html | https://r13y.com | 19.09 RMs: disasm, sphalerite | https://logs.nix.samueldr.com/nixos-dev
ekleog has quit [Quit: back soon]
justanotheruser has quit [Ping timeout: 240 seconds]
ekleog has joined #nixos-dev
evanjs- has joined #nixos-dev
evanjs- has quit [Ping timeout: 276 seconds]
_ris has quit [Ping timeout: 258 seconds]
evanjs- has joined #nixos-dev
<disasm> lol, httppretty requires access to stream.twitter.com to run a test smh
<disasm> hmmm that's fixed already in master
<disasm> globin: https://github.com/NixOS/nixpkgs/pull/68754 fixes the hydra-eval-failures.py script
<disasm> gocdagent is failing on java.net.InetAddress.getLocalHost. Any java experts know whats needed in a nixos test for the function to work?
<jtojnar> worldofpeace: I created this thing to make obtaining GNOME changelogs for projects easier https://github.com/jtojnar/what-changed
<disasm> instead of fixing something that hasn't been touched upstream, is there a way to specify a kernel module only works on kernels < 5? Or should I just mark as broken?
drakonis has quit [Quit: WeeChat 2.4]
evanjs- has quit [Quit: ZNC 1.7.4 - https://znc.in]
evanjs- has joined #nixos-dev
evanjs- has quit [Ping timeout: 268 seconds]
orivej has quit [Ping timeout: 240 seconds]
phreedom_ has joined #nixos-dev
phreedom has quit [Ping timeout: 260 seconds]
ixxie has joined #nixos-dev
justanotheruser has joined #nixos-dev
justanotheruser has quit [Ping timeout: 265 seconds]
justanotheruser has joined #nixos-dev
<globin> first 19.09 (non-small) channel release incoming after this eval has finished building (2 builds left) https://hydra.nixos.org/eval/1542359#tabs-unfinished
ixxie has quit [Ping timeout: 245 seconds]
avn has quit [Remote host closed the connection]
_ris has joined #nixos-dev
__monty__ has joined #nixos-dev
ixxie has joined #nixos-dev
init_6 has joined #nixos-dev
<worldofpeace> jtojnar++
<worldofpeace> jtojnar: that will help very much, typically I'm navigating gitlab url's like cli :D
<ma27[m]> globin++
orivej has joined #nixos-dev
greizgh has joined #nixos-dev
evanjs- has joined #nixos-dev
evanjs- has quit [Client Quit]
evanjs- has joined #nixos-dev
misuzu has quit [Remote host closed the connection]
misuzu has joined #nixos-dev
evanjs- has quit [Ping timeout: 245 seconds]
init_6 has quit []
<ddima> qyliss: I've taken a look at why linux-libre fails and it seems like the 4.19 tag has not been updated to reflect the most recent changes in the deblon-4.19 script (specifically for firmware_request_nowarn in BRCMFMAC). I'm currently trying to ask fsfla is its an oversight or if this is not sth we can rely on.
<ddima> s/deblon/deblob
drakonis has joined #nixos-dev
puck has quit [Ping timeout: 246 seconds]
<qyliss> ddima: thanks!
<qyliss> I was about to do the same
<qyliss> I joined their mailing list with this intention yesterday
<qyliss> I hope it's something we can rely on... otherwise we'd be stuck on old kernel versions for linux-libre
<ddima> we could consider running off trunk
<ddima> Im trying that out.
<qyliss> ddima: I update the linux-libre script whenever the scripts/ directory in trunk changes
<qyliss> (which is not all that often... most of the changes are to their distribution kernels)
<ddima> wrt reaching out - mailinglist sounds like a good idea. I've informally tried #fsfla now, but it might be rather dead, not sure yet
<qyliss> I'd have thought if they didn't at least want patches for old deblobbing scripts they'd remove them from their tree
<qyliss> Honestly the best thing to do is probably just to send them a patch.
<ddima> qyliss: yeah, but we still pull the tagged script version, while the one in trunk works fine (with manual testing atm)
<qyliss> Then I don't understand what you mean
<ddima> qyliss: the diff thats breaking this is: https://paste.fedoraproject.org/paste/kibCZR7zbsluZzoWa9kEfg
<qyliss> What script are you using?
<ddima> this is deblob-4.19 from the tagged version vs script from trunk, so they do have the fix, its just that the tag seems to be outdated
<qyliss> ohhhh
<ddima> thats at least the way it looks to me. and I manually tested those deblobs and they passed
<qyliss> I'll update the package to use trunk then
<qyliss> no reason we shouldn't be
<qyliss> thanks for noticing that
<ddima> alrighty
<ddima> we might as well consider bumping the kernel version too, since its a bit outdated - but we can probably do that separately
<qyliss> yeah, do that seperately
<thoughtpolice> Here are some notes I've been typing out about how the beta cache has been doing for the last month, and some other thoughts/technical details. You probably want to look at the pretty graphs, if anything: https://www.notion.so/Cache-Work-db97de6ab19b4d7ab9f4a60cb4cdaaf0
<thoughtpolice> (This is my own personal wiki thing but that page should be public. I'll transfer a bit of it onto the NixOS wiki later on)
<thoughtpolice> I was mostly intending to share the results with Graham when he gets back but I figured adding my own notes and just putting it here was best.
<ddima> thoughtpolice: nice writeup. thanks for that. will there be an opportunity to discuss the proposal? Im esp curious to ask a few questions about the bigquery bonanza parts.
<thoughtpolice> ddima: There's no real official "comment space", but you can just suggest things directly right now. :) Technically, I can enable comments/discussion on Notion, actually. But I think you need an account. There's nothing on like a proposal on GitHub, though, just yet.
<thoughtpolice> It's important stuff but one of those things nobody actually "cares" about as long as it works. The most raving review you're gonna get is "i tolerate it" :P
<ddima> So, what I'd be curious first is which type of analysis is desired and whether we'd really want to retain full raw logs - this could also help decide whether a classical time-series DB, which is def more space efficient, is sufficient or whether something like BQ might be needed. Further, I have some past experience with ClickHouse, Drill and Presto (the two latter one closer to the bigquery design than
<ddima> CH), so one could talk about pros and cons or an architecture. Also, I was wondering about storing world-wide users (including EU users) logs in the US, which likely include IPs and headers?
<thoughtpolice> Plus, insight into the cache and control over it is pretty limited (I have no access to the upstream one!), plus most people don't have knowledge of the tooling. The logging suggestions are something I'm open to though!
<ddima> its a bit lengthy for IRC ;)
<thoughtpolice> I'm thinking of just abandoning IPs, actually. And I forgot to mention the "expiring table" part. I only have major experience with ClickHouse/MemSQL. Also: we can probably do something better if we're smart with VCL.
<ddima> With GDPR and the nature of the project my first reaction would be that it would be probably best to log as little as possible and roll with highly aggregated metrics to stay on the safe side of things of GDPR but also actually preserve users privacy
<thoughtpolice> Can bigquery do rollups, do you know?
<thoughtpolice> ClickHouse has a materialized view feature that can do something like that with AggregatingTables.
<thoughtpolice> (That schema is actually one that probably doesn't work, either; it's from my own personal Fastly services I've used that I copy/pasted :P)
<ddima> Well, it can do rollups ofc with regular queries and materializing the results, but I dont think there is anything that can do this on the fly, unless they recently added sth
<ddima> CH is a bit more special in that regard. Cloudflare has some interesting writeups on that specifically for CDN and DNS use-cases (https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/)
<ddima> iirc BQ streaming inserts are also relatively expensive - not sure whether there is some free or sponsoring deal or sth, but also worth considering
<thoughtpolice> My current experiments have tried to use BQ for Fastly logs but I think looking at the CH roadmap is probably going to obsolete the major obstacle, which is that you need some pipeline to hook it up. But it looks like they're going to try tackling S3 ingestion sometime soon, which might work out well for our use.
<ddima> wrt schema: thats also why I was asking what sort of analysis is anticipated - because then one could see whether or not sensitive data might be needed
<ddima> the s3 logging is not stable yet? I briefly looked over the doc and it superficially looked like it should work - tho whether S3, GCS or BQ doesnt really change the privacy problem much. But it would allow use of sth like Drill or Presto relatively easily, or amazon athena.
<thoughtpolice> No I mean "ClickHouse automatically ingesting from S3" is on their roadmap. If you add something like Kafka to the mix you can work around that but that's a pretty heavy piece of kit.
<thoughtpolice> Fastly natively supports BigQuery as a logging endpoint, and S3, but not CH (maybe there's a ticket in a queue somewhere @ work but idk). That was my main holdup, it's not like I care for my own stuff. Going to S3 is pretty OK because the logs get buffered (clickhouse wants big, large bulk inserts all the time) but you still have to write an actual ingestion tool. Maybe one exists but I don't think so.
<thoughtpolice> But that was my main holdup. If it can do something like ingest into a set data in S3 into materialized tables on a schedule (similar to how it does with Kafka) that would be killer but we'll see.
Melkor333 has joined #nixos-dev
<thoughtpolice> Also I updated the schema! I think most of those stats are pretty useful and relatively non-invasive. (Maybe 'region' is only really useful in the US, I think? Most others will have 'country' report something accurate enough, I'd guess)
<thoughtpolice> Useful question I'd like to ask: What is the distribution of Nix versions in the wild? (Answer: check UA, set via libcurl). Another one that seems to always pop up is "why is the performance bad" but it's generally pretty tough to answer. But if you at least know the POP they talk to, you could ask "what is the TTFB for all requests at this POP over 24 hours vs 5 minutes"
<ddima> UA can be a bit iffy privacy wise
<ddima> but on its own not that bad, at least compared to a full apache style logline incl IP for instance (ianal)
<thoughtpolice> Well for Nix it only sets something like "libcurl <version>/Nix <version>", not anything else. I only care about the Nix version, really (but I guess the curl version might be useful if it had a bug.) You could of course just strip that with VCL to make it bulletproof, but I'd say that's like 99% of all requests for a service like that, though.
<ddima> maybe one could cook up some VCL code to extract just the nix part out of the UA and not log any others
<ddima> well, sure. just wanted to remark that one should be mindful about what one stores and how. then there is the whole storage outside of EU parts too.
<ddima> Well anyways, its a bigger topic. While its not black magic to build some simple pipelines using whichever of the discussed tools and make sure that data is aggregated as soon as possible and raw data ditched, it might be worth discussing which/how and whether the core team is willing to support this additional infra and how. Maybe sth for NixCon ;)
<ddima> thanks for the infos so far!
<thoughtpolice> Ideally my new configuration will just be perfect all the time and no information would ever be needed!
<thoughtpolice> ddima: The CH functionality looks promising but isn't upstream yet. Not automatic but much easier, so that's a good thing to note down!
<ddima> (Im a bit surprised though that fastly does not provide Pub/Sub or Kinesis or Kafka support, since those are the standard tools for event and log delivery ;))
<ddima> thoughtpolice: yeah, I'll def have a look at this feature. not heard of this one until now.
<thoughtpolice> They do support Kafka, and Kinesis.
<ddima> Oh, indeed. Some of the doc pages are just not exhaustive.
<thoughtpolice> Well I know we support Kafka, Kinesis is just from memory. But like I said that's too expensive operationally/cost wise, for me anyway. BQ is basically pennies for this stuff. ClickHouse is very hardware efficient but does require a server (only like, a $20/mo one)
<thoughtpolice> Unless like there's just a huge demand from Graham for Kafka and we really need it for a bunch of things I'd rather avoid it :P
<ddima> So, with kafka or some similar system one could avoid hitting long-term-storage and aggregate the data on the fly
<ddima> Ah ok, if you already compared cost, then I see. there is a managed kafka service in AWS now, but its not free but alleviates a lot of the maintenance effort and VM costs.
<thoughtpolice> Yeah I have a lot of pain tolerance but Kafka does not seem easy to manage, and I think that's reflected in the cost of the managed offerings.
<ddima> hm, no, they still book a broker per hour.
<thoughtpolice> If someone else managed it I'd probably use it like, all the time, though :P
<ddima> aight. I was arguing one of those more from the data privacy/risk perspective ;)
<ddima> but lets see how CH would deal with ingesting and aggregating from S3 and how cleanup of the raw data works in that scenario.
jtojnar has quit [Quit: jtojnar]
<thoughtpolice> ddima: There's an example ClickHouse schema at the bottom, now ;)
jtojnar has joined #nixos-dev
evanjs- has joined #nixos-dev
evanjs- has quit [Quit: ZNC 1.7.4 - https://znc.in]
evanjs- has joined #nixos-dev
evanjs- has quit [Client Quit]
evanjs- has joined #nixos-dev
evanjs- has quit [Client Quit]
evanjs- has joined #nixos-dev
evanjs- has quit [Quit: ZNC 1.7.4 - https://znc.in]
evanjs- has joined #nixos-dev
evanjs- has quit [Client Quit]
evanjs- has joined #nixos-dev
evanjs- has quit [Client Quit]
evanjs- has joined #nixos-dev
<ddima> thoughtpolice: nice. looking at that and the use-case you outlined before I think it might make sense to also include client.as.number / client.as.name as well.
<ddima> that would allow to drill down into ISPs and possibly look into issues with specific providers if need be without storing personally identifiable info.
drakonis has quit [Read error: Connection reset by peer]
drakonis has joined #nixos-dev
__monty__ has quit [Quit: leaving]
ixxie has quit [Ping timeout: 246 seconds]
Melkor333 has quit [Quit: WeeChat 2.6]
Melkor333 has joined #nixos-dev
Melkor333 has quit [Quit: WeeChat 2.6]
Melkor333 has joined #nixos-dev
m15k has joined #nixos-dev
Melkor333 has quit [Quit: WeeChat 2.6]