[PATCH v3 1/4] seccomp: add a return code to trap to userspace

Tue Jun 12 23:16:10 UTC 2018

Hi Matthew,

On Tue, Jun 12, 2018 at 02:39:03PM -0700, Matthew Helsley wrote:
> On Thu, May 31, 2018 at 7:49 AM, Tycho Andersen <tycho at tycho.ws> wrote:
> 
> <snip>
> 
> 
> > +struct seccomp_notif {
> > +       __u64 id;
> > +       pid_t pid;
> > +       struct seccomp_data data;
> > +};
> >
> 
> Since it's part of the UAPI I think it would be good to add documentation
> to this patch series. Part of that documentation should talk about which
> pid namespaces this pid value is relevant in. This is especially important
> since the feature is designed for use by things like container/sandbox
> managers.

Yes, at least Documentation/userspace-api/seccomp_filter.txt should be
updated. I'll add that for the next series.

> > +static void seccomp_do_user_notification(int this_syscall,
> > +                                        struct seccomp_filter *match,
> > +                                        const struct seccomp_data *sd)
> > +{
> > +       int err;
> > +       long ret = 0;
> > +       struct seccomp_knotif n = {};
> > +
> > +       mutex_lock(&match->notify_lock);
> > +       err = -ENOSYS;
> > +       if (!match->has_listener)
> > +               goto out;
> > +
> > +       n.pid = current->pid;
> >
> 
> How have you tested this code for correctness? I don't see many
> combinations being tested below nor here:
> https://github.com/tych0/kernel-utils/blob/master/seccomp/notify_stress.c
> 
> What about processes in different pid namespaces? Make tests that vary key
> parameters like these between the task generating the notifications and the
> task interested in processing the notifications. Make tests that try to
> kill them at interesting times too. etc. Make tests that pass the
> notification fd around and see how they work (or not).
> 
> I ask about testing because you're effectively returning a pid value to
> userspace here but not using the proper macros to access the task's struct
> pid for that purpose. That will work so long as both processes are in the
> same pid namespace but breaks otherwise.
> 
> Furthermore, a pid value is not the best solution for the queueing of these
> notifications because a single pid value is not meaningful/correct outside
> current's pid namespace and the seccomp notification file descriptor could
> be passed on to processes in another pid namespaces; this pid value will
> almost always not be relevant or correct there hence taking a reference to

Well, it *has* to be relevant in some cases: if you want to access
some of the task's memory to e.g. read the args to the syscall, you
need to ptrace or map_files to access the memory. So we do need to
pass it, but,

> the struct pid is useful. Then, during read(), the code would use the
> proper macro to turn the struct pid reference into a pid value relevant in
> the *reader's* pid namespace *or* return something obviously bogus if the
> reader is in a pid namespace that can't see that pid. This could prevent
> management processes from being tricked into clobbering the wrong process,
> feeding the wrong process sensitive information via syscall results, etc.

Yes, this makes sense. Seems like we need to do some magic about
passing the tracee's task struct to the listener, so it can do
task_pid_vnr(). We could perhaps require the listener to be in the
init_pid_ns case, but I think things like socket() might still be
useful even if you can't read the tracee's memory.

> Alternately, you could choose to specify that the seccomp manager is
> expected to be in the pid namespace of the process it's managing at all
> times. That's not necessarily trivial either because the process(es) it
> manages could potentially create new child pid namespaces. It also means
> that the processes being managed can "see" the manager process at all times.

Right, I think we don't want to require this.

> Regardless, you still need to use the proper macros to access current's pid
> for export to userspace.

Yes, thanks.

Tycho