qyliss changed the topic of #spectrum to: A compartmentalized operating system | https://spectrum-os.org/ | Logs: https://logs.spectrum-os.org/spectrum/
tilpner_ has joined #spectrum
tilpner has quit [Ping timeout: 260 seconds]
tilpner_ is now known as tilpner
jb55 has quit [Remote host closed the connection]
Irenes[m] has quit [Ping timeout: 244 seconds]
alj[m] has quit [Ping timeout: 244 seconds]
jb55 has joined #spectrum
Irenes[m] has joined #spectrum
alj[m] has joined #spectrum
<qyliss> Okay, if anybody wants to help out, run "$(nix-build https://edef.eu/~qyliss/qemu.nix)/bin/run-nixos-vm". Make sure to do it in a new terminal, because the serial output will mess it up.
<qyliss> It'll either panic and stop, or it'll give you a login prompt. Let me know which, and give me the output of lscpu | grep '^Model name:'
<qyliss> cc zgrep
<qyliss> (It'll build qemu first)
<ashkitten> qyliss: it tries to untar and fails
<qyliss> whoops
<qyliss> Download that file, and "$(nix-build /path/to/qemu.nix)/bin/run-nixos-vm" then :)
<ashkitten> okay
Irenes[m] has quit [Ping timeout: 244 seconds]
<ashkitten> qyliss: hash mismatch in fixed-output derivation '/nix/store/jii957949sm39wi38g8qs363vkim6llf-qemu-f9ab08c':
<ashkitten> wanted: sha256:0wp40gpnli3jxs52bvcx6gpawz6pg144vcnskn1zbxrr51nkxcy3
<ashkitten> got: sha256:1rpgyjw5mqzm9l7r24hy7ph8k2hlf7n7bnjbrakkcjm3b2fj5awq
<qyliss> aeousheoa
<qyliss> that hash isn't stable
<qyliss> can you just change it to the one it wants?
<qyliss> (this is why you don't use leaveDotGit)
<edef> edef@vixen ~> eval (nix-build -E 'import (builtins.fetchurl https://edef.eu/~qyliss/qemu.nix)')/bin/run-nixos-vm
<ashkitten> sure
<ashkitten> why is leaveDotGit there?
<qyliss> because QEMU refuses to build if it isn't
<ashkitten> gives me a login prompt!
<qyliss> interesting!
<qyliss> and the CPU?
<ashkitten> what is the login?
<qyliss> there isn't one
<qyliss> that's all you need
<qyliss> you can kill it now
<qyliss> (it's your host CPU I'm interested in)
<ashkitten> oh, i see
<ashkitten> Model name: AMD Ryzen 5 3600 6-Core Processor
<qyliss> interesting
<qyliss> so AMD is 2 for 2, and Intel is 0 for 1
<edef> i'm still fetching
Irenes[m] has joined #spectrum
<qyliss> yeah the submodules are super slow to fetch
<zgrep> I get a login prompt. "Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz"
<qyliss> Ooh that is very good info
<qyliss> thank you
<qyliss> zgrep ashkitten: could you also tell me your host kernel version? (uname -r)
<edef> qyliss: yields login prompt
<zgrep> 5.4.51
<ashkitten> 5.7.8
<qyliss> edef: cpu, kernel version please?
<edef> 5.4.44, i7-7600U
<qyliss> (I'm on 5.4.46 ftr)
<spacekookie> https://paste.rs/MnX
<spacekookie> CPU: Intel(R) Core(TM) i5-3360M CPU @ 2.80GHz
<qyliss> ^ also a kernel panic
<qyliss> this is good because it means it's not just my machine
<spacekookie> I assume we have different CPUs?
<qyliss> I have an i5-2520M
<qyliss> Mine is Sandy Bridge, yours is Ivy Bridge
<qyliss> So based on current data, Ivy Bridge and older doesn't work, and Skylake and newer does work
<qyliss> So it would be especially useful if anybody has a Haswell or Broadwell they could test on
<qyliss> In thinkpad terms that'd be a tx40
<qyliss> Oh, also, everybody who ran the script: this will have created a nixos.qcow2 file in the directory you ran it in
<qyliss> you might want to delete that :)
<qyliss> I wonder if I could fix the leaveDotGit impurity
<IdleBot_2e4f9b4b> Does it use git meaningfully? You could remove .git and re-git-init, I guess?
<qyliss> maybe
<IdleBot_2e4f9b4b> (I wonder what share of leaveDotGit could be replaced with this in Nixpkgs, but will not investigate right now)
<IdleBot_2e4f9b4b> Is the non-KVM boot expected to hand for ~59.2 seconds?
<IdleBot_2e4f9b4b> Then 60.5s more. Or maybe it expects something that is not provided in that jail…
<qyliss> Yeah non-KVM is slow af
<qyliss> why are you trying a non-KVM boot, though? The reproduction I posted should use KVM.
<IdleBot_2e4f9b4b> To see how success looks like… (also launched a jailed terminal without KVM and thought if I can do anything in it before launching a proper one… big mistake)
<alj[m]> The oldest Intel processor I have access to is the same you have qyliss (inside the t420) I can reproduce that sometime today or tomorrow if that's still relevant by then
<IdleBot_2e4f9b4b> Aha, virtio_pci_probe -> vp_reset Oops-es (but apparently the boot process goes on? But fails because vda is not there with broken virtio?)
<IdleBot_2e4f9b4b> i7-3740QM in Thinkpad W530
<qyliss> it's weird because other virtio-pci devices work fine
<qyliss> i7-3740QM is also Ivy Bridge so fits the pattern
<qyliss> kernel-side, it's the vp_reset in virtio_pci_modern.c that's being called (as opposed to virtio_pci_legacy.c)
<qyliss> and other devices are definitely calling that same function and succeeding.
<qyliss> Current focus of my attention is virtio_vhost_user_init_bar in virtio-vhost-user.c in QEMU
<qyliss> The reset is the first time the kernel tries to write to the PCI stuff
<qyliss> I'm guessing it's just not getting set up right somehow
<qyliss> I think it might be interesting to compare it to another QEMU device that does work to see if something is different
<IdleBot_2e4f9b4b> Is this Qemu repository so old as to lack IvyBridge emulation?
<qyliss> no
<qyliss> It's from 2019
<IdleBot_2e4f9b4b> I tried -cpu help and there is no IvyBridge — is it just switched off during the build then?
<qyliss> don't know
<qyliss> It should have all the way to skylake
<IdleBot_2e4f9b4b> Ahhh, I am stupid. It has _some_ IvyBridge, but not what I copied from a fresher Qemu help
<qyliss> -cpu IvyBridge works for me
<IdleBot_2e4f9b4b> But it does not seem to emulate as faithfully as to fail
<qyliss> yeah
<qyliss> Similarly -cpu Skylake-Client on a Sandy Bridge machine will still panic
<IdleBot_2e4f9b4b> Even -cpu qemu64
<qyliss> I'm staring at virtio_vhost_user_init_bar but can't figure out how I can find out what guest address the PCI bars get mapped to
<IdleBot_2e4f9b4b> If it would be specifically useful I could try i7-4770R tonight. Hopefully. Never got around to doing something about the failed RAM slot, but the second one with 8 GiB should be enough…
<IdleBot_2e4f9b4b> (I have an old GB-BRIX that I have not booted for years)
<qyliss> I think it would be useful to have that data
<qyliss> We probably won't be much further along by tonight because I'll be going to sleep in the next few hours I'd imagine
<qyliss> (woke up at 19:00 UTC yesterday or so)
<IdleBot_2e4f9b4b> If a programmer wakes up in the morning, it is a part of round-the-clock phase shift…
<IdleBot_2e4f9b4b> OK, hopefully it will not be too hard to bring up the partially broken BRIX, will try
<qyliss> It feels a bit weird to me that one of the virtio-vhost-user PCI bars is 64 GiB
<qyliss> holy shit that's it
<qyliss> changed that to be 64 MiB
<qyliss> no more kernel panic
<qyliss> I have no idea what an appropriate size of this thing is
<qyliss> But I _highly_ doubt 64 GiB is the correct number
<qyliss> cc puck edef
<qyliss> let's iterate and find the maximum size my computer will allow
<qyliss> 32 GiB works
<IdleBot_2e4f9b4b> That requires a rebuild, right?
<qyliss> Yeah
<qyliss> I'm working out of a QEMU tree
<puck> qyliss: hrmmm. so like, 64GiB BARs seem /reasonable/ to some degree
<puck> like, this is a worst-case max i think
<qyliss> sure
<qyliss> I am kinda unclear why older CPUs wouldn't like them
<qyliss> I wonder if Intel would have this limitation documented somewhere
<puck> oh.
<puck> wait a minute, what was the error again
<puck> like, which address did it break on
<qyliss> IdleBot_2e4f9b4b: if you wanted to test, you could substituteInPlace 1ULL << 36; to 1ULL << 35; in virtio-vhost-user.c
<puck> qyliss: i think i know the issue now, but it needs a bit of poking
<qyliss> IdleBot_2e4f9b4b: hw/virtio/virtio-vhost-user.c, that is
<puck> qyliss: what does "lscpu | grep 'Address sizes'" return?
<qyliss> Address sizes: 36 bits physical, 48 bits virtual
<puck> hmmm.
<qyliss> on uhura that's 43 bits physical, 48 bits virtual
<qyliss> so that could be it
<puck> yeah, i think skylake should have >36
<puck> but
<puck> i don't think /physical/ .. oh, kernels
<IdleBot_2e4f9b4b> Same 36/43 for me
<puck> so like
<puck> [ 1.288311] #PF: error_code(0x000b) - reserved bit violation
<puck> reserved bit violation.
<qyliss> IdleBot_2e4f9b4b: do you mean 36/48?
<puck> if your page table's address exceeds the physical address bits, it'll fault with a reserved bit violation!
<puck> edef: can you lscpu?
<puck> edef: and note the address sizes
<IdleBot_2e4f9b4b> I mean I cannot type, yes, 36/48
<qyliss> Another Intel host I have that it worked on is 39/48
<qyliss> This is looking good...
<puck> hehe.
<puck> fucking address space
<puck> qyliss: can you boot without the device on a 36/48 machine, and then lscpu inside the vm?
<puck> oh, hrmm
<qyliss> certainly can
<puck> prolly not a kernel issue, because the BAR gets assigned outside physical address space
<puck> i think the reason it's showing up is because a 64GiB BAR is rare
<qyliss> I couldn't find any record of anybody else ever running into an error like this
<puck> yeah
<qyliss> In fact, I could only find one instance of a reserved bit violation at all
<qyliss> (and it was irrelevant)
<puck> yeah, i suspect it's a qemu bug
<puck> because it assigned the BAR outside of physical address space
<qyliss> lol my VM's lscpu doesn't list Address Space
<puck> heh.
<puck> wait, Address sizing?
<puck> sizes*
<qyliss> I just grepped for Address
<puck> huh.
<puck> i guess that makes sense??
<puck> maybe.
<qyliss> it could just be old
<qyliss> it's ubuntu 18.04
<puck> nah
<puck> pretty sure that should have it
<puck> i guess?
<puck> anyways, my current guess (outside of not wanting to decode the page table) is "physical address is too big"
<qyliss> I'll try the NixOS VM
<puck> (or too small, i guess)
<IdleBot_2e4f9b4b> Does 63GiB work, BTW?
<qyliss> IdleBot_2e4f9b4b: has to be a power of 2
<qyliss> puck: Address sizes: 40 bits physical, 48 bits virtual
<puck> hehehehe
<puck> 40 bits physical
<qyliss> I probably have enough info to get back to the patch author?
<puck> i'd try with a cpu that has less address space, e.g. -cpu host or so
<puck> see if that changes
<qyliss> puck: that is -cpu host
<qyliss> or
<qyliss> wait
<qyliss> no it isn't
<qyliss> most of my VMs are
<qyliss> it's getting hard to keep track!
<qyliss> puck: yeah that was -cpu host
<puck> hmm
<puck> that's bad
<qyliss> I'm writing an email to the patch author now
<qyliss> I think probably just laying out the observations we now have should be enough to warrant an email
<qyliss> i.e., I tested Sandy Bridge and Ivy Bridge CPUs and could reproduce, and later Intel CPUs and could not reproduce
<qyliss> And the affected CPUs seem to have physical address sizes of 36 bits, and the non-affected ones have larger sizes
<puck> yeah
<qyliss> I do hope there's an eventual solution that isn't just "allocate 2^(physical address size - 1) and hope for the best"
<qyliss> (this is a POC, so it's probably that there _is_ a better way they just didn't implement yet, and I just don't know what it is)
<qyliss> anyway, I think now is probably a good time to head to bed :)
<edef> Address sizes: 39 bits physical, 48 bits virtual
<DrWhax_> edef: <3
jb55 has quit [Remote host closed the connection]
jb55 has joined #spectrum
<puck> edef: haha. issue is probably physical address space
cole-h has joined #spectrum
* edef pets DrWhax_
stigo has quit [Quit: stigo]
stigo has joined #spectrum
<Profpatsch> tazjin: am I ddossing you? docker pull nixery.dev/pkgsstatic.hello
<tazjin> Profpatsch: wrong channel? :p
<Profpatsch> yes!
MichaelRaskin has joined #spectrum
jb55 has quit [Remote host closed the connection]
jb55 has joined #spectrum
<MichaelRaskin> Regained some knowledge I didn't really miss having re: my old old old BRIX state…
<MichaelRaskin> (something flaky around SATA, cable reseating sometimes needed to boot)
<qyliss> puck: a more recent version of QEMU + the patchset doesn't work on 1 << 35, only 1 << 34
<qyliss> I suppose it probably allocates more stuff, so the address space fills up quicker
cole-h has quit [Quit: Goodbye]
ehmry has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]