[PATCH v3 seccomp 5/5] seccomp/cache: Report cache data through /proc/pid/seccomp_cache

Jann Horn jannh at google.com
Thu Oct 1 16:05:00 UTC 2020


On Thu, Oct 1, 2020 at 2:06 PM YiFei Zhu <zhuyifei1999 at gmail.com> wrote:
> On Wed, Sep 30, 2020 at 5:01 PM Jann Horn <jannh at google.com> wrote:
> > Hmm, this won't work, because the task could be exiting, and seccomp
> > filters are detached in release_task() (using
> > seccomp_filter_release()). And at the moment, seccomp_filter_release()
> > just locklessly NULLs out the tsk->seccomp.filter pointer and drops
> > the reference.
> >
> > The locking here is kind of gross, but basically I think you can
> > change this code to use lock_task_sighand() / unlock_task_sighand()
> > (see the other examples in fs/proc/base.c), and bail out if
> > lock_task_sighand() returns NULL. And in seccomp_filter_release(), add
> > something like this:
> >
> > /* We are effectively holding the siglock by not having any sighand. */
> > WARN_ON(tsk->sighand != NULL);
>
> Ah thanks. I was thinking about how tasks exit and get freed and that
> sort of stuff, and how this would race against them. The last time I
> worked with procfs there was some magic going on that I could not
> figure out, so I was thinking if some magic will stop the task_struct
> from being released, considering it's an argument here.
>
> I just looked at release_task and related functions; looks like it
> will, at the end, decrease the reference count of the task_struct.
> Does procfs increase the refcount while calling the procfs functions?
> Hence, in procfs functions one can rely on the task_struct still being
> a task_struct, but any direct effects of release_task may happen while
> the procfs functions are running?

Yeah.

The ONE() entry you're adding to tgid_base_stuff is used to help
instantiate a "struct inode" when someone looks up the path
"/proc/$tgid/seccomp_cache"; then when that path is opened, a "struct
file" is created that holds a reference to the inode; and while that
file exists, your proc_pid_seccomp_cache() can be invoked.

proc_pid_seccomp_cache() is invoked from proc_single_show()
("PROC_I(inode)->op.proc_show" is proc_pid_seccomp_cache), and
proc_single_show() obtains a temporary reference to the task_struct
using get_pid_task() on a "struct pid" and drops that reference
afterwards with put_task_struct(). The "struct pid" is obtained from
the "struct proc_inode", which is essentially a subclass of "struct
inode". The "struct pid" is kept refererenced until the inode goes
away, via proc_pid_evict_inode(), called by proc_evict_inode().

By looking at put_task_struct() and its callees, you can figure out
which parts of the "struct task" are kept alive by the reference to
it.

By the way, maybe it'd make sense to add this to tid_base_stuff as
well? That should just be one extra line of code. Seccomp filters are
technically per-thread, so it would make sense to have them visible in
the per-thread subdirectories /proc/$pid/task/$tid/.


More information about the Containers mailing list