[PATCH 1/2] seccomp: notify user trap about unused filter

Tycho Andersen tycho at tycho.ws
Wed May 27 21:52:03 UTC 2020


On Wed, May 27, 2020 at 02:43:49PM -0700, Kees Cook wrote:
> (While I'm here -- why can there be only one listener per task? The
> notifications are filter-specific, not task-specific?)

Not sure what you mean here?

> > To fix this, we introduce a new "live" reference counter that tracks the
> > live tasks making use of a given filter and when a notifier is
> > registered waiting tasks will be notified that the filter is now empty
> > by receiving a (E)POLLHUP event.
> > The concept in this patch introduces is the same as for signal_struct,
> > i.e. reference counting for life-cycle management is decoupled from
> > reference counting live taks using the object.
> 
> I will need convincing that life-cycle ref-counting needs to be decoupled
> from usage ref-counting.

I think it does, since the refcount is no longer 1:1 with the number
of tasks that have it (a notification fd's struct file has a reference
too).

We could also do it the reverse way, and keep track of how many
notification fds point to a particular file. But somehow we need two
counts.

Maybe it's best to decouple them entirely, and have usage go back to
just being the number of tasks, and introduce a new counter for
notification fds.

> I see what you're saying here and in the other
> reply about where the notification is coming from (release vs put, etc),
> but I think it'd be better if the EPOLLHUP was handled internally to the
> VFS due to the kernel end of the file being closed.
> 
> > There's probably some trickery possible but the second counter is just
> > the correct way of doing this imho and has precedence. The patch also
> > lifts the waitqeue from struct notification into into sruct
> > seccomp_filter. This is cleaner overall and let's us avoid having to
> > take the notifier mutex since we neither need to read nor modify the
> > notifier specific aspects of the seccomp filter. In the exit path I'd
> > very much like to avoid having to take the notifier mutex for each
> > filter in the task's filter hierarchy.
> 
> I guess this is a minor size/speed trade-off (every seccomp_filter
> struct grows by 1 pointer regardless of the presence of USER_NOTIF
> rules attached...). But I think this is an optimization detail, and I
> need to understand why we can't just close the file on filter free.

That seems nicest, agreed.

Tycho


More information about the Containers mailing list