[PATCH v7 8/9] seccomp: Introduce addfd ioctl to seccomp user notifier

Will Drewry wad at chromium.org
Tue Jul 14 18:20:08 UTC 2020


On Thu, Jul 9, 2020 at 1:26 PM Kees Cook <keescook at chromium.org> wrote:
>
> From: Sargun Dhillon <sargun at sargun.me>
>
> The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over
> an fd. It is often used in settings where a supervising task emulates
> syscalls on behalf of a supervised task in userspace, either to further
> restrict the supervisee's syscall abilities or to circumvent kernel
> enforced restrictions the supervisor deems safe to lift (e.g. actually
> performing a mount(2) for an unprivileged container).
>
> While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall,
> only a certain subset of syscalls could be correctly emulated. Over the
> last few development cycles, the set of syscalls which can't be emulated
> has been reduced due to the addition of pidfd_getfd(2). With this we are
> now able to, for example, intercept syscalls that require the supervisor
> to operate on file descriptors of the supervisee such as connect(2).
>
> However, syscalls that cause new file descriptors to be installed can not
> currently be correctly emulated since there is no way for the supervisor
> to inject file descriptors into the supervisee. This patch adds a
> new addfd ioctl to remove this restriction by allowing the supervisor to
> install file descriptors into the intercepted task. By implementing this
> feature via seccomp the supervisor effectively instructs the supervisee
> to install a set of file descriptors into its own file descriptor table
> during the intercepted syscall. This way it is possible to intercept
> syscalls such as open() or accept(), and install (or replace, like
> dup2(2)) the supervisor's resulting fd into the supervisee. One
> replacement use-case would be to redirect the stdout and stderr of a
> supervisee into log file descriptors opened by the supervisor.
>
> The ioctl handling is based on the discussions[1] of how Extensible
> Arguments should interact with ioctls. Instead of building size into
> the addfd structure, make it a function of the ioctl command (which
> is how sizes are normally passed to ioctls). To support forward and
> backward compatibility, just mask out the direction and size, and match
> everything. The size (and any future direction) checks are done along
> with copy_struct_from_user() logic.
>
> As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
> alignment without requiring packing as there have been packing issues
> with uapi highlighted before[2][3]. Although we could overload the
> newfd field and use -1 to indicate that it is not to be used, doing
> so requires changing the size of the fd field, and introduces struct
> packing complexity.
>
> [1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
> [2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
> [3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal
>
> Suggested-by: Matt Denton <mpdenton at google.com>
> Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
> Signed-off-by: Sargun Dhillon <sargun at sargun.me>
> Co-developed-by: Kees Cook <keescook at chromium.org>
> Signed-off-by: Kees Cook <keescook at chromium.org>

Reviewed-by: Will Drewry <wad at chromium.org>


More information about the Containers mailing list