[PATCH v1 0/6] seccomp: Implement constant action bitmaps

YiFei Zhu zhuyifei1999 at gmail.com
Fri Sep 25 07:07:00 UTC 2020


On Fri, Sep 25, 2020 at 12:56 AM Rasmus Villemoes
<linux at rasmusvillemoes.dk> wrote:
> Yes, the man page would read something like
>
>        SECCOMP_SET_MODE_FILTER_BITMAP
>               The system calls allowed are defined by a pointer to a
> Berkeley Packet Filter (BPF) passed  via  args.
>               This argument is a pointer to a struct sock_fprog_bitmap;
>
> with that struct containing whatever information/extra pointers needed
> for passing the bitmap(s) in addition to the bpf prog.
>
> And SECCOMP_SET_MODE_FILTER would internally just be updated to work
> as-if all-zero allow-bitmaps were passed along. The internal kernel
> bitmap would just be the and of the bitmaps in the filter stack.
>
> Sure, it's UAPI, so would certainly need more careful thought on details
> of just how the arg struct looks like etc. etc., but I was wondering why
> it hadn't been discussed at all.

If SECCOMP_SET_MODE_FILTER is attached before / after
SECCOMP_SET_MODE_FILTER_BITMAP, does it mean all bitmap gets void?

Would it make sense to have SECCOMP_SET_MODE_FILTER run through the
emulator to see if we can construct a bitmap anyways for "legacy
no-bitmap" support?

Another thing to consider is that in both patch series we only
construct one final bitmap that, if the bit is set, seccomp will not
call into the BPF filter. If the bit is not set, then all filters are
called in sequence, even if some of them "must allow the syscall".
With SECCOMP_SET_MODE_FILTER_BITMAP, the filter BPF code will no
longer have the "if it's this syscall" for any syscalls that are given
in the bitmaps, and calling into these filters will be a false
negative. So we would need extra logic to make "does this filter have
a bitmap? if so check bitmap first". Probably won't be too
complicated, but idk if it is actually worth the complexity. wdyt?

> Regardless, I'd like to see some numbers, certainly for the "how much
> faster does a getpid() or read() or any of the other syscalls that
> nobody disallows" get, but also "what's the cost of doing that emulation
> at seccomp(2) time".

The former has been given in my RFC patch [1]. In an extreme case of
no side channel mitigations, in the same amount of time, unixbench
syscall mixed runs 33295685 syscalls without seccomp, 20661056
syscalls with docker profile, 25719937 syscalls with bitmapped docker
profile. Though, I think Jack was running on Ubuntu and it did not
have a libseccomp shipped with the distro that's new enough to do the
binary decision tree generation [2].

I'll try to profile the latter later on my qemu-kvm, with a recent
libsecomp with binary tree and docker's profile, probably both direct
filter attaches and filter attaches with fork(). I'm guessing if I
have fork() the cost of fork() will overshadow seccomp() though.

[1] https://lore.kernel.org/containers/cover.1600661418.git.yifeifz2@illinois.edu/
[2] https://github.com/seccomp/libseccomp/pull/152

YiFei Zhu


More information about the Containers mailing list