<bk1603[m]1>
From what I understand filters contain the calls that are to be allowed/disallowed based on the flags. What does he mean by filters containing the native arch/ABI?
<pie_>
you might get more replies if it wasnt truncated to a matrix link
<bk1603[m]1>
Oh sorry, didn't realise that. I'll retype that in separate messages :)
<bk1603[m]1>
> As you are planning on using bogus syscall numbers, you will need to be careful to only add these rules to filters containing the native arch/ABI as the libseccomp API will fail otherwise with an error.
<{^_^}>
error: syntax error, unexpected ',', expecting ')', at (string):345:51
<bk1603[m]1>
From what I understand filters contain the calls, so what does filter containing the native arch/ABI mean?
<bk1603[m]1>
Moreover, to what filter does an rr specific syscall go?
<bk1603[m]1>
I was trying the approach explained by poettering, in the same issue. The one in which he says we can introduce a syntax like `4711@x86-64`.
<bk1603[m]1>
pie_: I hope this is better? Or was it truncated again?
<pie_>
doesnt look truncated now
<pie_>
oh didnt notice it was you
<pie_>
heh, what a coincidence
<pie_>
bk1603[m]1: ^
<pie_>
you didnt highlight me so i didnt pay attention and thought some random person was asking something
<bk1603[m]1>
Oh ha, no worries :]
<pie_>
bk1603[m]1: mm, i dont see through it clearly but this is something to do with the libseccomp architecture translation stuff
<pie_>
it may possibly relate to using exact or translated rules
<pie_>
might help to read over what pcmoore said a few more times idk
<bk1603[m]1>
Maybe I should switch to the other approach (`SYSTEMD_NSPAWN_SECCOMP=0` for no seccomp on nspawn containers) for the time being, since that would provide us with a temporary solution, and I could later work that part out?
<pie_>
im also quite tired so ymmv at the moment
<bk1603[m]1>
Oh i see, I'll try reading it again then
<pie_>
i dont know which approach is simpler
<pie_>
naively i say that disabling seccomp might involve more refactoring, but im not sure?
<pie_>
where as this would be a relatively localised parser / rule extension?
<bk1603[m]1>
Umm, the way I am looking at it is, that reading the value of an environment variable and then disabling seccomp entirely might be easier to implement, as when compared to parsing the numbers, architectures and then pushing them in the correct filters. Though I am not sure.
<pie_>
i wouldnt know,
<bk1603[m]1>
I don't know how hard would it be to disable seccomp entirely
<pie_>
do you think any code needs to be changed or does seccomp just need to toggle a flag and then the calls are ignored:
<pie_>
s/:/?
<pie_>
i think this might be a reasonable question to ask; i.e. how one should go about disabling seccomp
<pie_>
or look for prior art
<bk1603[m]1>
I think I only need to add a parser for the syscall numbers, and the architectures, and another if branch. But the part with libseccomp API erroring out on having bogus syscall numbers confuses me.
<bk1603[m]1>
In fact I already have the parsers ready. (I did that before I left.)
<pie_>
ok i think i can explain that part
<bk1603[m]1>
But I was confused by the flags used there, and what pcmoore said
<pie_>
but this is also just a guess
<pie_>
so there is the "exact" api, and there is some other api
<pie_>
the other api does some sort of translation from say, "somesyscall", somesyscall on arch A will have code 1, but on arch B it will have code 2
<pie_>
i dont know how / if it does that translation with syscall codes instead of strings (if it even can) but thats the basic idea
<pie_>
if you look at the linux syscall numbers, IIRC there are some where its commented that on a different architecture there is a different syscall that does the same thing...something like that
<pie_>
i dragged the issue along but i dont have a whole lot of an idea whats going on either x) which is why im always adding citations and references and stuff everywhere
<pie_>
did that make any sense btw?
<pie_>
oh right i didnt even finish
<bk1603[m]1>
Ah I see, yes that is true. Syscalls can have different codes. So now I understand the different APIs a whole lot better. But what happens when we pass the syscall codes, specifically codes that don't exist anywhere except in rr is still not clear.
<pie_>
so, if you pass a syscall number to the api that does the translation, it doesnt have anything to translate _to_ because it doesnt have anything to translate _from_ because the syscall number for the _from_ doesnt exist at the kernel level, its just something rr uses with its ptrace stuff somehow
<pie_>
and presumably the api that doesnt do the translation just completely ignores whether the number is valid? - or at least skips the translation step or something
<pie_>
i guess i also dont know what exactly happens to it, i kind of assumed its fine because pcmoore didnt say it wouldnt work
<pie_>
wait hold on
<bk1603[m]1>
Yes we can directly add numbers to the filters. We use the api to get valid syscall codes anyways. So we can skip translation. (That's what I saw in the systemd source.)
<pie_>
i might have said a bunch of stuff thats wrong
<bk1603[m]1>
I think we can check what syscalls are valid via `seccomp_syscall_resolve_num_arch`, but then again it's of no use for our scenario.
<pie_>
"As you are planning on using bogus syscall numbers, you will need to be careful to only add these rules to filters containing the native arch/ABI as the libseccomp API will fail otherwise with an error."
<pie_>
ok what i said about which api does what is probably completely wrong
<bk1603[m]1>
xD cool
<pie_>
i think what pcmoore actually said is that if youre on x86 and you tell it to enable syscall somethingorother at a different abi it will fail
<pie_>
*aat a different arch
<bk1603[m]1>
But how do we say syscalls on what arch? Don't we just keep adding them to the filters after libseccomp's API has translated them?
<pie_>
i assume systemd will not fail happily if a seccomp rule errors, so thats something worth testing
<pie_>
bk1603[m]1: so does the api let you specify the arch?
<pie_>
or is that the question
<bk1603[m]1>
Yeah that's what I am confused about. Let me explain:
<bk1603[m]1>
We can use `seccomp_syscall_resolve_name` and get a code for the current architecture that we are on, or we can use `seccomp_syscall_resolve_name_arch` and get the code for the syscall on a specific arch
<pie_>
also yeah pcmoores 3rd paragraph is probably kind of important, so you should understand what is being said there
<bk1603[m]1>
But the thing is, we don't need the translation part right? And I don't think we specify the architectures when we add the calls to the filter
<bk1603[m]1>
What I am confused about is, do filters contain architecture specific information?
<pie_>
hm
<bk1603[m]1>
If yes, then to which architecture/filter do the rr specific syscalls go?
<pie_>
well, the lookup functions only return a syscall number right?
<bk1603[m]1>
yep
<pie_>
and i guess the actual filter itself only takes that because i guess it would make sense that it doesnt need anything else
<bk1603[m]1>
That's what I am thinking, I think we push the syscall number and the error with which to respond in the filter.
<pie_>
so this exact / inexact / translation stuff should be showing up in the lookup function? or is there a function that does the filter AND the lookup stuff at once?
<pie_>
maybe that means stuff like if you pass the right syscall number for the wrong architecture it fixes it
<bk1603[m]1>
Yep this exact/inexact translation stuff should be showing up in the translations, and not in the filters. So should I even worry about this? Or is it important to filters too? As he said, "filters containing native architecture/ABI".
<pie_>
*in the add filter call <pie_> maybe that means stuff like if you pass the right syscall number for the wrong architecture it fixes it
<pie_>
*so if you use _exact it will fail if you pass the "right" number for the wrong architecture*
<pie_>
which one does systemd use anyway?
<bk1603[m]1>
I'll take a look at the man pages again, but I am still not sure if I've explained my query to you so I'll give it one last shot. Either the filters do contain the architectures, and then we need to add the numbers to the right filters. In this case we need to decide on what filter to use rr. I don't think this would be a valid approach in this case. Secondly, the filters do not contain the architectures, but then what
<bk1603[m]1>
would cause a libseccomp error?
<bk1603[m]1>
Or should I not even worry about that and simply add the number
<pie_>
i think half the problem is i have no idea what im talking about :)
<pie_>
" add the numbers to the right filters" not sure what you mean by this
<bk1603[m]1>
Haha, I can't blame you. Even I don't have a lot of idea about this. If we can't reach on a conclusion, I'll try asking people in the seccomp channel themselves.
<bk1603[m]1>
What I mean is
{\_\} has quit [Remote host closed the connection]
<pie_>
oh theres a seccomp channel?
<bk1603[m]1>
I don't know yet
<pie_>
ah.
<pie_>
well if there is one, it would have probably been best to start there tbh
<pie_>
ok so, after having discussed this a bit im guessing what happens is (syscall, arch) -> syscall number -> syscall number translation (optional) -> filter adding succeeds or fails
<bk1603[m]1>
<pie_ "" add the numbers to the right f"> If there are filters specific to architectures, and we cannot add the rr syscalls to any of the existing filters, since there won't be no "correct" filters for these calls. Since these calls don't exist.
<bk1603[m]1>
<pie_ "ok so, after having discussed th"> Umm, I think the syscall number translation happens depending upon exact versus the other API? The rest I think is what I find to be true too.
<pie_>
yeah i think so?
<bk1603[m]1>
> If there are filters specific to architectures, and we cannot add the rr syscalls to any of the existing filters, since there won't be no "correct" filters for these calls. Since these calls don't exist.
<bk1603[m]1>
s/and/then
<{^_^}>
error: syntax error, unexpected ',', expecting ')', at (string):345:47
<pie_>
so i dont completely understand your question but my limited understanding is that that is the step where failure may occur
<pie_>
i guess ill look what a filter rule actually looks like
<bk1603[m]1>
Yep, and I'll go read the docs and that comment once more, and see if I can find a solution/ask a better question :)
<bk1603[m]1>
Thanks anyways for the discussion. You did clear out the difference between the two APIs that seccomp has.
<pie_>
dont believe me believe the docs and what the other people said
<pie_>
bk1603[m]1: it might be good to summarize your understanding in a comment at some point to make sure things are going in the right direction
<bk1603[m]1>
I think what you said was pretty close to what I read in the issue :) From what pcmoore said, the normal APIs, allow different arguments/no arguments on different architectures. (At least that's what the example says. I don't remember exactly if it's the same that I read in the docs.
<bk1603[m]1>
Sure, I'll make sure I add a comment about what I understood and what I'm planning to do.
<bk1603[m]1>
Also, I think you're right, taking a look at the filters might help with the confusion.
<bk1603[m]1>
The baseline is that exact vs normal is for compatibility, with normal being compatible with more architectures wherever possible. Taking a look at how the filters are used, and what API does systemd uses should help with the query.
<pie_>
might also be possible to just write some isolated test code in an main.c or something
<pie_>
by the way, idk how you intend to test this but using the nixos vm generation stuff may or may not be faster than reboots or something
<pie_>
idk if you have a separate machin
<pie_>
also at this point im not even sure if we explicitly need to pass an arch since libseccomp does translation?
<pie_>
though i think we would want to be able to specify then if we want the translation behaviour or not, this might be something to ask about
<pie_>
and then maybe in the case that we dont want the translation but the arch is ambiguous, then we'd have to give the arch?
<bk1603[m]1>
Exactly, do we need to pass the archs? Since we don't need any translations. (At least not the ones done by `resolve` api calls.)
<pie_>
well i think the cases we dont need to pass the arch are when we dont wnat translation and the arch is unambiguous, or when we want translation
<pie_>
i guess its a question of whether poettering would want the syscall redundantly labeled with its arch
<bk1603[m]1>
I mean we don't need translations for the rr syscalls right? they'll be the same on all archs? Or they different too? (I don't think they should be different.)
<pie_>
they specifically are the same on all archs
<pie_>
but its not like we get to half-ass this
<pie_>
someone somewhere is going to depend on this stuff hopefully working in a non-insane way :P
<bk1603[m]1>
xD true, but the thing is, even if we need the transaltions on the other calls, we can pass an architecture with them. Because they have an architecture, unlike rr calls
<pie_>
well, actually i dont think rr even orks on anythign other than x86/64?
<bk1603[m]1>
If I were to extremely shorten my query, I would ask, "Under what architecture filters do we place the rr syscalls?"
<bk1603[m]1>
Umm but do we pass x86/64 as the architecture for rr calls?
<pie_>
in a sense youre right in that theres no explicit arch restriction for rr calls, "its just some specific syscall numbers" which is why im saying maybe it makese sense for the arch to be optional
<pie_>
but then we need to figure out exactly what that should mean
<pie_>
pettering did suggest @* for "any arch"
<pie_>
also we should make parsing non-insane
<bk1603[m]1>
So `1000` on rr is `1000@*`?
<pie_>
i _guessssss_
<pie_>
do you know anything about what sort of characters are allowed in syscall names or something?
<pie_>
because i would prefer some kind of prefix to identify that a syscall is coming up in the parse, as opposed to something else
<bk1603[m]1>
haha, ok so far so good, but then we get back to my first question. pcmoore saying having a syscall for some other arch in a filter may result in a libseccomp API error. To what filter does `1000@*` go? If it goes to all filters, wouldn't it be an API error? Since no architectures actually contain any call that goes by the code `1000`
<pie_>
ok
<bk1603[m]1>
Umm, I don't know about what characters are allowed. Though I can search that up.
<pie_>
welll, prooobably no symbols, but maybe thats also reasonable to ask about
<pie_>
ok you know what, maybe its worth asking about that in the issue
<pie_>
^ @ at what point does a syscall for the wrong arch error
<bk1603[m]1>
Cool, I'll ask it right away then!
<pie_>
also something i was wondering and you kind of point at: is there a difference between saying "any arch" or not specifying an arch?
<pie_>
i.e. "1000" vs "1000@*"
<pie_>
(does those variants even show up in the api?)
<bk1603[m]1>
Yeah that's another query that I had. I mean the api still doesn't have anything like `1000@*` or even strings that are numbers. All it has is syscall names, that are converted to their respective codes by seccomp.
<bk1603[m]1>
So as of right now "1000" would simply be invalid i think?
<pie_>
idk
<pie_>
also if you happen to remember and i dont do it, remind me in two days or so to check if i have any questions too
<bk1603[m]1>
I'll take a look at the docs for that. Though I don't think the seccomp api accepts something like "1000". (I guess you asked something related in the issue too, not sure how related though.)
<bk1603[m]1>
I'll ping you here in 2 days from now.
<bk1603[m]1>
And sure :)
<pie_>
well you arent going to be passing tht to the name lookup function
<pie_>
youre converting that to an int an passing it to add rule as the syscall number
<bk1603[m]1>
Aha, in that case, we can add numbers to the filter. But how would the api behave, on getting a number that isn't the right one for the current architecture, I don't know.
<bk1603[m]1>
I'll go and take a look at how are filters read by the seccomp API. And if I still don't get it I'll ask it in the issue itself. Either ways, I'll remember to ping you with the answer once I find it.
<pie_>
ok i thought we were on the same page about that
<bk1603[m]1>
Are we not?
<pie_>
i thought your question was in the context of the syscall number bein passed to the rule
<pie_>
*your original question
justanotheruser has quit [Ping timeout: 260 seconds]
<bk1603[m]1>
Oh, sorry for the misunderstanding then :)
<bk1603[m]1>
I'll try and be more clear the next time around.
<bk1603[m]1>
What I want to know is how does the API work with the filters and the architectures. Does it care about what architecture we are on, once we have syscall numbers in the filter? Or does it not?
<bk1603[m]1>
Hmm, I think I am starting to get it a bit.
<pie_>
well, i also made an assumption, but its kind of interesting that the same questions seemed to be roughly applicable
leah2 has joined #nixcon
{\_\} has joined #nixcon
<bk1603[m]1>
(syscall name, arch) -> syscall code -> filter -> translation -> application. In this, I think, we might face an error in the translation part if we don't have a syscall number in the filter that responds to the current arch?
<pie_>
i would put translation before filter
<bk1603[m]1>
> well, i also made an assumption, but its kind of interesting that the same questions seemed to be roughly applicable
<bk1603[m]1>
Ah, it indeed is.
<{^_^}>
error: syntax error, unexpected ',', expecting ')', at (string):345:5
<bk1603[m]1>
Umm, I think we both mean different filters.
<pie_>
i think i knw why you put it in that order but id put it the other way around
<pie_>
well mayb
<bk1603[m]1>
Or maybe not, let me just clear it out. By filter I mean the hasmap that is contains all the syscalls that are to be filtered and the errror numbers that we should respond with.
<pie_>
there is int seccomp_rule_add
<pie_>
and seccomp_rule_add_exact
<pie_>
so i assume seccomp_rule_add takes the syscall number and does some itnernal translatioon and then calls add_exact
<pie_>
or something like that
<pie_>
*internal translation
<bk1603[m]1>
yeah that's what I think too
<bk1603[m]1>
and it might be this translation that pcmoore was talking about
<bk1603[m]1>
?
<pie_>
if 1000 does not exist on any arch seccomp_rule_add will fail because of the translation pass?
<pie_>
well, thats what im guessing
<pie_>
you could also just try to check what the libseccomp source code does
<bk1603[m]1>
Exactly! Now if that doesn't fail, my work is almost done. But if it does fail, then we can't add rr specific calls like this right?
<pie_>
seccomp_rule_add
<pie_>
well if it does fail maybe we have to call _exact
<bk1603[m]1>
> you could also just try to check what the libseccomp source code does
<bk1603[m]1>
Definitely.
<{^_^}>
undefined variable 'you' at (string):345:1
<bk1603[m]1>
If I passed a syscall code that I made up to seccomp_rule_add_exact, would it work? I think it wouldn't work with both of them. Since pcmoore said that seccomp_rule_add is more flexible, and even that fails with wrong syscall codes.
<pie_>
dunno
<pie_>
you could try to check
<pie_>
also if you like python i think that also has seccomp bindings so you could try teting with that
<pie_>
or jsut do it in c
zupo has quit [Ping timeout: 272 seconds]
<bk1603[m]1>
Oh yep, true, I can try testing it out. I'll do that then.
andi- has quit [Remote host closed the connection]
andi- has joined #nixcon
zupo has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
zupo has joined #nixcon
justanotheruser has joined #nixcon
nix-build has quit [Remote host closed the connection]
{^_^} has joined #nixcon
<bk1603[m]1>
pie_ thanks a lot! That will definitely help with the testing, I'm still reading the docs for what filters are, and how does seccomp read them.
<bk1603[m]1>
Well if I try to add a bogus syscall, I get an error. (Just tested it with the python expression you provided in the issue.)
<bk1603[m]1>
The error reads Bad system call
<bk1603[m]1>
A skim through the docs doesn't reveal how architectures work with filters exactly. There are warnings about how a syscall in the wrong filter will fail, but no indications as to if filters have an architecture.
<bk1603[m]1>
From the docs I read that the filter is an array of structs containing the filter code, jt, jf and a generic multi-use field
<bk1603[m]1>
These structs are BPF instructions
<bk1603[m]1>
And what these operate on, are seccomp_data which in turn contains the architectures and the actual syscalls.
<bk1603[m]1>
I gotta sleep now since I have an exam tomorrow morning, I'll take a look at this further tomorrow. But for tonight, it seems as if we can't add syscall numbers that do not exist to filters.
<bk1603[m]1>
PS: There are no libseccomp channels, and the Google group links they provided won't open.
FRidh has joined #nixcon
chkno has quit [Ping timeout: 240 seconds]
LnL- is now known as LnL
zupo has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
justanotheruser has quit [Quit: WeeChat 2.9]
justanotheruser has joined #nixcon
monk has left #nixcon ["Error from remote client"]
justanotheruser has quit [Ping timeout: 268 seconds]
monk has joined #nixcon
justanotheruser has joined #nixcon
monk has left #nixcon ["Error from remote client"]
FRidh has quit [Quit: Konversation terminated!]
<pie_>
bk1603[m]1: if the google group is really broken (somewhat doubtful?) i might open an issue on their tracker about that, but also at least you can ask in the systemd thread i guess
<pie_>
bk1603[m]1: did you check if there is an exact / normal variant of the call?
<pie_>
bk1603[m]1: hope you do well on your exam
V is now known as ^
^ is now known as V
zupo has joined #nixcon
zupo has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]