[PATCH RESEND 2/5] seccomp: Add wait_killable semantic to seccomp user notifier

Rodrigo Campos rodrigo at kinvolk.io
Wed Apr 28 11:10:49 UTC 2021

On Wed, Apr 28, 2021 at 2:22 AM Tycho Andersen <tycho at tycho.pizza> wrote:
> On Tue, Apr 27, 2021 at 04:19:54PM -0700, Andy Lutomirski wrote:
> > User notifiers should allow correct emulation.  Right now, it doesn't,
> > but there is no reason it can't.
> Thanks for the explanation.
> Consider fsmount, which has a,
>         ret = mutex_lock_interruptible(&fc->uapi_mutex);
>         if (ret < 0)
>                 goto err_fsfd;
> If a regular task is interrupted during that wait, it return -EINTR
> or whatever back to userspace.
> Suppose that we intercept fsmount. The supervisor decides the mount is
> OK, does the fsmount, injects the mount fd into the container, and
> then the tracee receives a signal. At this point, the mount fd is
> visible inside the container. The supervisor gets a notification about
> the signal and revokes the mount fd, but there was some time where it
> was exposed in the container, whereas with the interrupt in the native
> syscall there was never any exposure.

IIUC, this is solved by my patch, patch 4 of the series. The
supervisor should do the addfd with the flag added in that patch
(SECCOMP_ADDFD_FLAG_SEND) for an atomic "addfd + send".

That means when using the atomic "addfd+send" what happens is: either
we add the fd _and_ the added fd value is returned to the syscall or
the fd is not added at all and the container sees the syscall as
interrupted. Therefore, the fd is only visible to the container when
it should.


More information about the Containers mailing list