[PATCH v1 0/6] seccomp: Implement constant action bitmaps

Kees Cook keescook at chromium.org
Mon Sep 28 20:04:50 UTC 2020


On Sat, Sep 26, 2020 at 01:11:50PM -0500, YiFei Zhu wrote:
> On Fri, Sep 25, 2020 at 2:07 AM YiFei Zhu <zhuyifei1999 at gmail.com> wrote:
> > I'll try to profile the latter later on my qemu-kvm, with a recent
> > libsecomp with binary tree and docker's profile, probably both direct
> > filter attaches and filter attaches with fork(). I'm guessing if I
> > have fork() the cost of fork() will overshadow seccomp() though.
> 
> I'm surprised. That is not the case as far as I can tell.
> 
> I wrote a benchmark [1] that would fork() and in the child attach a
> seccomp filter, look at the CLOCK_MONOTONIC difference, then add it to
> a struct timespec shared with the parent. It checks the difference
> with the timespec before prctl and before fork. CLOCK_MONOTONIC
> instead of CLOCK_PROCESS_CPUTIME_ID because of fork.
> 
> I ran `./seccomp_emu_bench 100000` in my qemu-kvm and here are the results:
> without emulator:
> Benchmarking 100000 syscalls...
> 19799663603 (19.8s)
> seecomp attach without fork: 197996 ns
> 33911173847 (33.9s)
> seecomp attach with fork: 339111 ns
> 
> with emulator:
> Benchmarking 100000 syscalls...
> 54428289147 (54.4s)
> seecomp attach without fork: 544282 ns
> 69494235408 (69.5s)
> seecomp attach with fork: 694942 ns
> 
> fork seems to take around 150us, seccomp attach takes around 200us,
> and the filter emulation overhead is around 350us. I had no idea that
> fork was this fast. If I wrote my benchmark badly please criticise.

You're calling clock_gettime() inside your loop. That might change the
numbers. Why not just measure outside the loop, or better yet, use
"perf" to measure the time in prctl().

> Given that we are doubling the time to fork() + seccomp attach filter,
> I think yeah running the emulator on the first instance of a syscall,
> holding a lock, is a much better idea. If I naively divide 350us by
> the number of syscall + arch pairs emulated the overhead is less than
> 1 us and that should be okay since it only happens for the first
> invocation of the particular syscall.
> 
> [1] https://gist.github.com/zhuyifei1999/d7bee62bea14187e150fef59db8e30b1

Regardless, let's take things one step at a time. First, let's do
the simplest version of the feature, and then let's look at further
optimizations.

Can you send a v3 and we can continue from there?

-- 
Kees Cook


More information about the Containers mailing list