<arianvp>
I think the fastest way to see if this is worthwhile is to write a little script that converts NARs into CATARs and see the size difference
<arianvp>
:P
<arianvp>
(after then indexing said CATARs)
<tilpner>
I can see how it might be helpful for serving store paths from local disk
<tilpner>
Not sure how feasible it is for cache.nixos.org
<arianvp>
well the nice hting is that it will give incremental updates too. If a NAR of nginx-1.19 is similar to nginx1.18 the download would be incremental
<arianvp>
however it wuld mean you need to store the NARs on the client which we currently dont do I think?
<tilpner>
Oh, so Nix learns about chunks, it's not just a cache/hydra-side thing?
<tilpner>
No, they're currently not stored, and that might increase space usage a lot
<arianvp>
casync solves this by mounting the NAR files using FUSE
<arianvp>
instead of unpacking them
<tilpner>
Which would probably be a slow-down on average over having the realised files on disk
<arianvp>
yeh so it would both be a space-saving but also a bandwidth-saving technique
<tilpner>
The space savings might make up for that, but that needs numbers
<arianvp>
yes there might be a slow-down. and yes these are all unknowns
<arianvp>
just seems like a fun direction to explore; ill put some time in it
<tilpner>
I'm exporting my local store to NARs and intend to chunk them afterwards
<tilpner>
Just to see how/if the size changes
<tilpner>
How would this play with compression though?
<tilpner>
Don't most local caches store the nars in compressed form?
<arianvp>
it works okay-ish with things like squashfs where random access is preserved
<arianvp>
but yes compression can indeed ruin some of the benefits
<tilpner>
Yes, I've seen it. I'm cautious against running every single library/application through FUSE, but I don't know how much actual overhead there is
<arianvp>
ack
<clever>
arianvp: have you seen narfuse and fusenar?
<tilpner>
If it dedups really well, it might even speed up access on slow spinning disks
<clever>
each string, is prefixed by a 64bit int denoting the string's length
<clever>
and a directory is then a "(", a series of "regular|symlink|directory" + <the above> pairs, and a ")"
<clever>
so you could trivially break a nar up into its component strings, and then hash each one
<clever>
of note, the strings: name, node, entry, regular, symlink, directory, contents, (, and ) appear often
<clever>
so you might need an escape hatch to not dedup those (they are smaller then a hash) and just keep them in the resulting stream
<clever>
so you would transform a nar, into a series of tokens and hashes, along with a hash=body set, that can be shared
<clever>
arianvp: but if your running zfs, you get the same kind of dedup, at least at the storage level, but not at the network layer
<arianvp>
there is some explanation about why this is still a benefit on COW systems
<arianvp>
casync is COW-aware and will even do some reflinking magic itself too
<clever>
network is the main one i can see
<arianvp>
yes
<arianvp>
it is meant as a image delivery mechanism
<clever>
nix-store --optimize pretty much makes cow-aware stuff un-needed
das_j has quit [Remote host closed the connection]
<arianvp>
not as a storage-mechanism
<clever>
another issue though, at the network layer
<clever>
are you going to store each chunk as a file over http?
<clever>
thats a round trip per file
<tilpner>
Might be fine with pipelining or parallel requests?
<clever>
thats kinda half the point of things like nar and tar, so you can download 1000's of files as a single bytestream, rather then having to do 1000's of requests
<arianvp>
HTTP2 solves that doesnt it?
<arianvp>
(Not that the casync tool currently supports that. it's a bit of an experimental thing it seems)
<clever>
http1.1 can do pipelining, 2 just adds things like the server force-feeding you stuff you didnt know you wanted yet
<clever>
but if you already have parts, the server force-feeding you chunks is a waste of bandwidth
<arianvp>
yeh just wanted to say. you dont want predictive push as the benefit comes from the client knowing where they are
<clever>
i can see ipfs being of use here, but that complicates another one of my ideas
<clever>
and makes the perf even worse
<arianvp>
not sure what ipfs has got to do with this? I do remember bringing up casync on an IPFS+nix thread before though but dont remember why
<{^_^}>
nix#1006 (by Ericson2314, 3 years ago, open): git tree object as alternative to NAR
<clever>
ipfs is just a merkle_hash(value)=value storage system
<arianvp>
just like casync :P
<clever>
and if you know the hash, you can then fetch the object from the ipfs network
<arianvp>
(sort of)
<clever>
there is a limit to chunk size in ipfs
<clever>
so it also supports special chunks, that are just a list of hashes of other chunks
<tilpner>
IPFS had fairly high resource consumption, last I tried (hours ago)
<tilpner>
30-40% CPU on a small VPS, and 300-400MB memory use
<clever>
the main cost i can see with ipfs, is that it has to turn each file, into a tree of chunks
<clever>
and it then has to post into the DHT, at the hash of each chunk, "peer XYZ has this chunk"
<clever>
so if your file breaks into 1000 chunks, you have to do 1000 DHT puts
<clever>
and if somebody wants to download that file, they have to do 1000 DHT gets, from random points within the hash table
<tilpner>
And it has to join the DHT in the first place, which rules out a few usecases of Nix
<clever>
ipfs would be an optional thing
<clever>
i'm just thinking, if you are going to be hashing each chunk anyways, if you use the ipfs hashing rules, then you are making it an option
<clever>
if you use plain sha256, then your forcing the need to have a hash->hash lookup, to use ipfs fetches
<Ericson2314>
clever: have they not considered a sparse DHT with just some common roots, and if you can't find it in there crawling up your graph to see if anyone has something which refers to the thing you are missing?
<Ericson2314>
not a great system, but good to cope with a too large dht
<clever>
Ericson2314: ive not looked in depth at what the dht is doing, mostly just filling the gaps in with how other dht's work
<Ericson2314>
right i haven't looked either
<arianvp>
how do I copy my entire /nix/store to nar files?
<clever>
arianvp: one minute
<arianvp>
nix-store --export ?
<clever>
--export doesnt make a nar
<clever>
it makes a different thing, that contains many nars
orivej has quit [Ping timeout: 245 seconds]
<clever>
`nix-store --dump /nix/store/foo > foo.nar` will make a nar, but wont include any closure info, it wont even include the "foo" in the nar itself
<clever>
arianvp: this will recursively copy (and xz compress) a given path, to a given dir, and generate narinfo files that perserve the closure data
<clever>
if that directory is served over http, it can then be used as a binary cache
pie_ has quit [Ping timeout: 250 seconds]
<arianvp>
so to have one for my system I have to pass the nixos derivaiton there?
<clever>
yeah
<clever>
you can also pass it /run/current-system/ to just cache whatever your currently running
<arianvp>
how does the cache do non-directory entries in nix store?
<arianvp>
are they also NAR'd?
<clever>
yep
<clever>
the root element in the nar is a file in that case
<clever>
or a symlink
<arianvp>
as in
<arianvp>
. is a file?
<clever>
yeah
<arianvp>
darnit casync doesnt support that. so ill have to skip all the file stuff and only d odirectories
<clever>
with nar files, every element only has a type, and a body
<clever>
names are not attached to the elements themselves
<clever>
rather, a directory is just a series of name+element pairs
orivej has joined #nixos-dev
<clever>
and the root element can be any type of element, so its valid for the root to just be a file, in which case, no name exists
<clever>
arianvp: also, for any fixed-output derivation with outputHashMode = "recursive";, the sha256 of it, is just the sha256 of the nar
<clever>
arianvp: so when your doing fetchFromGithub, your giving it the hash of the nar for the $out it generates
<clever>
outputHashMode = "flat"; is for the special case where you expect $out to be a file, in which case, your giving it the plain hash of that file without wrapping it in a nar
<clever>
arianvp: you may need to special-case files, and just have them bypass catar? and just be stored as a single chunk?
orivej has quit [Ping timeout: 245 seconds]
pie_ has joined #nixos-dev
<arianvp>
okay casyncing my entire store
<arianvp>
:)
<clever>
arianvp: nix copy also has a --all flag
<arianvp>
Nix doesnt really have a hard dependency on NARs does it? as in I could just extract these catar files and then register the paths right?
<arianvp>
make a prototype that bypasses nix's own caching stuff
<clever>
arianvp: /nix/store is read-only by default, so you must go thru nix-daemon to extract anything
<arianvp>
oh yeh darnit
<tilpner>
zfs does not like this .castr structure
<tilpner>
9.1G vs 7.3G (--apparent-size)
<clever>
tilpner: try to find the worst offending file, then ls -lhs it
<arianvp>
yeh so there is special case code for btrfs filesystems in casync
<arianvp>
wonder if it does the same on zfs probably not
<tilpner>
zfs doesn't support reflink AFAIK
<arianvp>
ah
<tilpner>
clever: It's a giant directory forest, and lots of small files
<clever>
tilpner: ah, there is a min size for things
<arianvp>
tilpner: allign your --chunk-size with the zfs min-size
<arianvp>
will probably be more friendly :)
<clever>
ls -lhs will reveal the min size
<clever>
[clever@amd-nixos:~/apps/nixpkgs-master]$ ls -lUsh /nix/store/.links/ | head
<clever>
at 538 bytes and up, it eats a whole 4.5kb
<arianvp>
oh I forgot to enable --with=symlinks oops
<clever>
83 bytes and down, it fits inside the pointers that would normally say where the data exists, so it doesnt even need a data block
<arianvp>
wonder what will happen now, whether it will follow the symlinks or just ignore the
<clever>
between 83 and 538 bytes, youll need to investigate more :P
<tilpner>
clever: They're not small enough to fit under 83 bytes
<tilpner>
And I'm not sure what I expected anyway
<tilpner>
Presumably, the ecosystem would have to agree on one chunking configuration
<tilpner>
Which might be problematic with everyone running different FS' with different settings
* tilpner
ends casync after creating 390841 .cacnks
<arianvp>
274812 and going
<arianvp>
=)
das_j has joined #nixos-dev
<das_j>
how is that assert at the top of pkgs/os-specific/linux/phc-intel/default.nix supposed to work? Wouldn't meta.broken be more appropriate?
<clever>
das_j: broken might also work, would want to test it though of course
<das_j>
clever: Alright. Because it prevents one of my systems from evaluating because it has a 4.9 kernel
<clever>
das_j: i would expect broken to cause the same problem, if you attempt to reference that package
<das_j>
clever: That's the weird thing. I do not reference it
<clever>
what does --show-trace say?
<das_j>
Now that I think about it, it's probably another issue. The system builds fine on 19.03, but fails on unstable
<das_j>
cc ajs124
ixxie has quit [Ping timeout: 258 seconds]
pie_ has quit [Ping timeout: 250 seconds]
pie_ has joined #nixos-dev
ixxie has joined #nixos-dev
ixxie has quit [Ping timeout: 248 seconds]
ixxie has joined #nixos-dev
drakonis has joined #nixos-dev
pie_ has quit [Ping timeout: 250 seconds]
pie_ has joined #nixos-dev
lopsided98 has quit [Remote host closed the connection]
lopsided98 has joined #nixos-dev
ixxie has quit [Ping timeout: 245 seconds]
ixxie has joined #nixos-dev
justanotheruser has quit [Ping timeout: 245 seconds]
pie_ has quit [Ping timeout: 250 seconds]
orivej has joined #nixos-dev
justanotheruser has joined #nixos-dev
pie_ has joined #nixos-dev
pie_ has quit [Ping timeout: 250 seconds]
<marek>
how do we forward changes from staging to master? just opening a PR againts master with cherry picked commit once it is verified in staging?
<samueldr>
staging eventually graduates into staging-next when it's deemed good to go to master, and staging-next is where a last check for breakage, and fixing those is done
<ivan>
staging is for things that rebuild too many packages and so shouldn't go into master until a staging-next merge
<samueldr>
staging-next is eventually merged into master once deemed good