For review: seccomp_user_notif(2) manual page

Tycho Andersen tycho at tycho.pizza
Wed Sep 30 15:03:30 UTC 2020


On Wed, Sep 30, 2020 at 01:07:38PM +0200, Michael Kerrisk (man-pages) wrote:
>        2. In order that the supervisor process can obtain  notifications
>           using  the  listening  file  descriptor, (a duplicate of) that
>           file descriptor must be passed from the target process to  the
>           supervisor process.  One way in which this could be done is by
>           passing the file descriptor over a UNIX domain socket  connec‐
>           tion between the two processes (using the SCM_RIGHTS ancillary
>           message type described in unix(7)).   Another  possibility  is
>           that  the  supervisor  might  inherit  the file descriptor via
>           fork(2).

It is technically possible to inherit the fd via fork, but is it
really that useful? The child process wouldn't be able to actually do
the syscall in question, since it would have the same filter.

>           The  information  in  the notification can be used to discover
>           the values of pointer arguments for the target process's  sys‐
>           tem call.  (This is something that can't be done from within a
>           seccomp filter.)  To do this (and  assuming  it  has  suitable

s/To do this/One way to accomplish this/ perhaps, since there are
others.

>           permissions),   the   supervisor   opens   the   corresponding
>           /proc/[pid]/mem file, seeks to the memory location that corre‐
>           sponds to one of the pointer arguments whose value is supplied
>           in the notification event, and reads bytes from that location.
>           (The supervisor must be careful to avoid a race condition that
>           can occur when doing this; see the  description  of  the  SEC‐
>           COMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation below.)  In addi‐
>           tion, the supervisor can access other system information  that
>           is  visible  in  user space but which is not accessible from a
>           seccomp filter.
> 
>           ┌─────────────────────────────────────────────────────┐
>           │FIXME                                                │
>           ├─────────────────────────────────────────────────────┤
>           │Suppose we are reading a pathname from /proc/PID/mem │
>           │for  a system call such as mkdir(). The pathname can │
>           │be an arbitrary length. How do we know how much (how │
>           │many pages) to read from /proc/PID/mem?              │
>           └─────────────────────────────────────────────────────┘

PATH_MAX, I suppose.

>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │From my experiments,  it  appears  that  if  a  SEC‐ │
>        │COMP_IOCTL_NOTIF_RECV   is  done  after  the  target │
>        │process terminates, then the ioctl()  simply  blocks │
>        │(rather than returning an error to indicate that the │
>        │target process no longer exists).                    │

Yeah, I think Christian wanted to fix this at some point, but it's a
bit sticky to do. Note that if you e.g. rely on fork() above, the
filter is shared with your current process, and this notification
would never be possible. Perhaps another reason to omit that from the
man page.

>        SECCOMP_IOCTL_NOTIF_ID_VALID
>               This operation can be used to check that a notification ID
>               returned by an earlier SECCOMP_IOCTL_NOTIF_RECV  operation
>               is  still  valid  (i.e.,  that  the  target  process still
>               exists).
> 
>               The third ioctl(2) argument is a  pointer  to  the  cookie
>               (id) returned by the SECCOMP_IOCTL_NOTIF_RECV operation.
> 
>               This  operation is necessary to avoid race conditions that
>               can  occur   when   the   pid   returned   by   the   SEC‐
>               COMP_IOCTL_NOTIF_RECV   operation   terminates,  and  that
>               process ID is reused by another process.   An  example  of
>               this kind of race is the following
> 
>               1. A  notification  is  generated  on  the  listening file
>                  descriptor.  The returned  seccomp_notif  contains  the
>                  PID of the target process.
> 
>               2. The target process terminates.
> 
>               3. Another process is created on the system that by chance
>                  reuses the PID that was freed when the  target  process
>                  terminates.
> 
>               4. The  supervisor  open(2)s  the /proc/[pid]/mem file for
>                  the PID obtained in step 1, with the intention of (say)
>                  inspecting the memory locations that contains the argu‐
>                  ments of the system call that triggered  the  notifica‐
>                  tion in step 1.
> 
>               In the above scenario, the risk is that the supervisor may
>               try to access the memory of a process other than the  tar‐
>               get.   This  race  can be avoided by following the call to
>               open with a SECCOMP_IOCTL_NOTIF_ID_VALID operation to ver‐
>               ify  that  the  process that generated the notification is
>               still alive.  (Note that  if  the  target  process  subse‐
>               quently  terminates, its PID won't be reused because there
>               remains an open reference to the /proc[pid]/mem  file;  in
>               this  case, a subsequent read(2) from the file will return
>               0, indicating end of file.)
> 
>               On success (i.e., the notification  ID  is  still  valid),
>               this  operation  returns 0 On failure (i.e., the notifica‐
                                          ^ need a period?

>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Interestingly, after the event  had  been  received, │
>        │the  file descriptor indicates as writable (verified │
>        │from the source code and by experiment). How is this │
>        │useful?                                              │

You're saying it should just do EPOLLOUT and not EPOLLWRNORM? Seems
reasonable.

> 
> EXAMPLES
>        The (somewhat contrived) program shown below demonstrates the use

May also be worth mentioning the example in
samples/seccomp/user-trap.c as well.

Tycho


More information about the Containers mailing list