[PATCH net-next 0/3] eBPF Seccomp filters

Tom Hromatka tom.hromatka at oracle.com
Tue Feb 13 20:38:53 UTC 2018



On 02/13/2018 01:35 PM, Kees Cook wrote:
> On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hromatka at oracle.com> wrote:
>> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun at sargun.me> wrote:
>>> This patchset enables seccomp filters to be written in eBPF. Although,
>>> this patchset doesn't introduce much of the functionality enabled by
>>> eBPF, it lays the ground work for it.
>>>
>>> It also introduces the capability to dump eBPF filters via the PTRACE
>>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>>> In the attached samples, there's an example of this. One can then use
>>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>>> and use that at reload time.
>>>
>>> The primary reason for not adding maps support in this patchset is
>>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>>> If we have a map that the BPF program can read, it can potentially
>>> "change" privileges after running. It seems like doing writes only
>>> is safe, because it can be pure, and side effect free, and therefore
>>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>>> to an agreement, this can be in a follow-up patchset.
>>
>>
>> Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
>> userspace mailing list just last week:
>> https://groups.google.com/forum/#!topic/libseccomp/pX6QkVF0F74
>>
>> The kernel changes I proposed are in this email:
>> https://groups.google.com/d/msg/libseccomp/pX6QkVF0F74/ZUJlwI5qAwAJ
>>
>> In that email thread, Kees requested that I try out a binary tree in cBPF
>> and evaluate its performance.  I just got a rough prototype working, and
>> while not as fast as an eBPF hash map, the cBPF binary tree was a
>> significant
>> improvement over the linear list of ifs that are currently generated.  Also,
>> it only required changing a single function within the libseccomp libary
>> itself.
>>
>> https://github.com/drakenclimber/libseccomp/commit/87b36369f17385f5a7a4d95101185577fbf6203b
>>
>> Here are the results I am currently seeing using an in-house customer's
>> seccomp filter and a simplistic test program that runs getppid() thousands
>> of times.
>>
>> Test Case                      minimum TSC ticks to make syscall
>> ----------------------------------------------------------------
>> seccomp disabled                                             620
>> getppid() at the front of 306-syscall seccomp filter         722
>> getppid() in middle of 306-syscall seccomp filter           1392
>> getppid() at the end of the 306-syscall filter              2452
>> seccomp using a 306-syscall-sized EBPF hash map              800
>> cBPF filter using a binary tree                              922
> I still think that's a crazy filter. :) It should be inverted to just
> check the 26 syscalls and a final "greater than" test. I would expect
> it to be faster still. :)
>
> -Kees

I completely agree it's a crazy filter, but it seems to be a
common "mistake" our users are making.  It would be nice to
help them out if we can.

Tom



More information about the Containers mailing list