[PATCH net-next 0/3] eBPF Seccomp filters

Kees Cook keescook at chromium.org
Tue Feb 13 20:35:46 UTC 2018


On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hromatka at oracle.com> wrote:
> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun at sargun.me> wrote:
>>
>> This patchset enables seccomp filters to be written in eBPF. Although,
>> this patchset doesn't introduce much of the functionality enabled by
>> eBPF, it lays the ground work for it.
>>
>> It also introduces the capability to dump eBPF filters via the PTRACE
>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>> In the attached samples, there's an example of this. One can then use
>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>> and use that at reload time.
>>
>> The primary reason for not adding maps support in this patchset is
>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>> If we have a map that the BPF program can read, it can potentially
>> "change" privileges after running. It seems like doing writes only
>> is safe, because it can be pure, and side effect free, and therefore
>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>> to an agreement, this can be in a follow-up patchset.
>
>
>
> Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
> userspace mailing list just last week:
> https://groups.google.com/forum/#!topic/libseccomp/pX6QkVF0F74
>
> The kernel changes I proposed are in this email:
> https://groups.google.com/d/msg/libseccomp/pX6QkVF0F74/ZUJlwI5qAwAJ
>
> In that email thread, Kees requested that I try out a binary tree in cBPF
> and evaluate its performance.  I just got a rough prototype working, and
> while not as fast as an eBPF hash map, the cBPF binary tree was a
> significant
> improvement over the linear list of ifs that are currently generated.  Also,
> it only required changing a single function within the libseccomp libary
> itself.
>
> https://github.com/drakenclimber/libseccomp/commit/87b36369f17385f5a7a4d95101185577fbf6203b
>
> Here are the results I am currently seeing using an in-house customer's
> seccomp filter and a simplistic test program that runs getppid() thousands
> of times.
>
> Test Case                      minimum TSC ticks to make syscall
> ----------------------------------------------------------------
> seccomp disabled                                             620
> getppid() at the front of 306-syscall seccomp filter         722
> getppid() in middle of 306-syscall seccomp filter           1392
> getppid() at the end of the 306-syscall filter              2452
> seccomp using a 306-syscall-sized EBPF hash map              800
> cBPF filter using a binary tree                              922

I still think that's a crazy filter. :) It should be inverted to just
check the 26 syscalls and a final "greater than" test. I would expect
it to be faster still. :)

-Kees

-- 
Kees Cook
Pixel Security


More information about the Containers mailing list