[PATCH RESEND 2/5] seccomp: Add wait_killable semantic to seccomp user notifier

Rodrigo Campos rodrigo at kinvolk.io
Wed Apr 28 13:20:02 UTC 2021


On Wed, Apr 28, 2021 at 1:10 PM Rodrigo Campos <rodrigo at kinvolk.io> wrote:
>
> On Wed, Apr 28, 2021 at 2:22 AM Tycho Andersen <tycho at tycho.pizza> wrote:
> >
> > On Tue, Apr 27, 2021 at 04:19:54PM -0700, Andy Lutomirski wrote:
> > > User notifiers should allow correct emulation.  Right now, it doesn't,
> > > but there is no reason it can't.
> >
> > Thanks for the explanation.
> >
> > Consider fsmount, which has a,
> >
> >         ret = mutex_lock_interruptible(&fc->uapi_mutex);
> >         if (ret < 0)
> >                 goto err_fsfd;
> >
> > If a regular task is interrupted during that wait, it return -EINTR
> > or whatever back to userspace.
> >
> > Suppose that we intercept fsmount. The supervisor decides the mount is
> > OK, does the fsmount, injects the mount fd into the container, and
> > then the tracee receives a signal. At this point, the mount fd is
> > visible inside the container. The supervisor gets a notification about
> > the signal and revokes the mount fd, but there was some time where it
> > was exposed in the container, whereas with the interrupt in the native
> > syscall there was never any exposure.
>
> IIUC, this is solved by my patch, patch 4 of the series. The
> supervisor should do the addfd with the flag added in that patch
> (SECCOMP_ADDFD_FLAG_SEND) for an atomic "addfd + send".

Well, under Andy's proposal handling that is even simpler. If the
signal is delivered after we added the fd (note that the container
syscall does not return when the signal arrives, as it happens today,
it just signals the notifier and continues to wait), we can just
ignore the signal and return that (if that is the appropriate thing
for that syscall, but I guess after adding an fd there isn't any other
reasonable thing to do).



Best,
Rodrigo


More information about the Containers mailing list