For review: seccomp_user_notif(2) manual page

Michael Kerrisk (man-pages) mtk.manpages at gmail.com
Wed Sep 30 20:34:51 UTC 2020


Hi Tycho,

Thanks for taking time to look at the page!

On 9/30/20 5:03 PM, Tycho Andersen wrote:
> On Wed, Sep 30, 2020 at 01:07:38PM +0200, Michael Kerrisk (man-pages) wrote:
>>        2. In order that the supervisor process can obtain  notifications
>>           using  the  listening  file  descriptor, (a duplicate of) that
>>           file descriptor must be passed from the target process to  the
>>           supervisor process.  One way in which this could be done is by
>>           passing the file descriptor over a UNIX domain socket  connec‐
>>           tion between the two processes (using the SCM_RIGHTS ancillary
>>           message type described in unix(7)).   Another  possibility  is
>>           that  the  supervisor  might  inherit  the file descriptor via
>>           fork(2).
> 
> It is technically possible to inherit the fd via fork, but is it
> really that useful? The child process wouldn't be able to actually do
> the syscall in question, since it would have the same filter.

D'oh! Yes, of course.

I think I was reaching because in an earlier conversation
you replied:

[[
> 3. The "target process" passes the "listening file descriptor"
>    to the "monitoring process" via the UNIX domain socket.

or some other means, it doesn't have to be with SCM_RIGHTS.
]]

So, what other means?

Anyway, I removed the sentence mentioning fork().

>>           The  information  in  the notification can be used to discover
>>           the values of pointer arguments for the target process's  sys‐
>>           tem call.  (This is something that can't be done from within a
>>           seccomp filter.)  To do this (and  assuming  it  has  suitable
> 
> s/To do this/One way to accomplish this/ perhaps, since there are
> others.

Yes, thanks, done.

>>           permissions),   the   supervisor   opens   the   corresponding
>>           /proc/[pid]/mem file, seeks to the memory location that corre‐
>>           sponds to one of the pointer arguments whose value is supplied
>>           in the notification event, and reads bytes from that location.
>>           (The supervisor must be careful to avoid a race condition that
>>           can occur when doing this; see the  description  of  the  SEC‐
>>           COMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation below.)  In addi‐
>>           tion, the supervisor can access other system information  that
>>           is  visible  in  user space but which is not accessible from a
>>           seccomp filter.
>>
>>           ┌─────────────────────────────────────────────────────┐
>>           │FIXME                                                │
>>           ├─────────────────────────────────────────────────────┤
>>           │Suppose we are reading a pathname from /proc/PID/mem │
>>           │for  a system call such as mkdir(). The pathname can │
>>           │be an arbitrary length. How do we know how much (how │
>>           │many pages) to read from /proc/PID/mem?              │
>>           └─────────────────────────────────────────────────────┘
> 
> PATH_MAX, I suppose.

Yes, I misunderstood a fundamental detail here, as Jann 
also confirmed.

>>        ┌─────────────────────────────────────────────────────┐
>>        │FIXME                                                │
>>        ├─────────────────────────────────────────────────────┤
>>        │From my experiments,  it  appears  that  if  a  SEC‐ │
>>        │COMP_IOCTL_NOTIF_RECV   is  done  after  the  target │
>>        │process terminates, then the ioctl()  simply  blocks │
>>        │(rather than returning an error to indicate that the │
>>        │target process no longer exists).                    │
> 
> Yeah, I think Christian wanted to fix this at some point,

Do you have a pointer that discussion? I could not find it with a 
quick search.

> but it's a
> bit sticky to do.

Can you say a few words about the nature of the problem?

In the meantime. I think this merits a note under BUGS, and
I've added one.

> Note that if you e.g. rely on fork() above, the
> filter is shared with your current process, and this notification
> would never be possible. Perhaps another reason to omit that from the
> man page.

(Yes, as noted above, I removed that sentence.)

>>        SECCOMP_IOCTL_NOTIF_ID_VALID
>>               This operation can be used to check that a notification ID
>>               returned by an earlier SECCOMP_IOCTL_NOTIF_RECV  operation
>>               is  still  valid  (i.e.,  that  the  target  process still
>>               exists).
>>
>>               The third ioctl(2) argument is a  pointer  to  the  cookie
>>               (id) returned by the SECCOMP_IOCTL_NOTIF_RECV operation.
>>
>>               This  operation is necessary to avoid race conditions that
>>               can  occur   when   the   pid   returned   by   the   SEC‐
>>               COMP_IOCTL_NOTIF_RECV   operation   terminates,  and  that
>>               process ID is reused by another process.   An  example  of
>>               this kind of race is the following
>>
>>               1. A  notification  is  generated  on  the  listening file
>>                  descriptor.  The returned  seccomp_notif  contains  the
>>                  PID of the target process.
>>
>>               2. The target process terminates.
>>
>>               3. Another process is created on the system that by chance
>>                  reuses the PID that was freed when the  target  process
>>                  terminates.
>>
>>               4. The  supervisor  open(2)s  the /proc/[pid]/mem file for
>>                  the PID obtained in step 1, with the intention of (say)
>>                  inspecting the memory locations that contains the argu‐
>>                  ments of the system call that triggered  the  notifica‐
>>                  tion in step 1.
>>
>>               In the above scenario, the risk is that the supervisor may
>>               try to access the memory of a process other than the  tar‐
>>               get.   This  race  can be avoided by following the call to
>>               open with a SECCOMP_IOCTL_NOTIF_ID_VALID operation to ver‐
>>               ify  that  the  process that generated the notification is
>>               still alive.  (Note that  if  the  target  process  subse‐
>>               quently  terminates, its PID won't be reused because there
>>               remains an open reference to the /proc[pid]/mem  file;  in
>>               this  case, a subsequent read(2) from the file will return
>>               0, indicating end of file.)
>>
>>               On success (i.e., the notification  ID  is  still  valid),
>>               this  operation  returns 0 On failure (i.e., the notifica‐
>                                           ^ need a period?
> 
>>        ┌─────────────────────────────────────────────────────┐
>>        │FIXME                                                │
>>        ├─────────────────────────────────────────────────────┤
>>        │Interestingly, after the event  had  been  received, │
>>        │the  file descriptor indicates as writable (verified │
>>        │from the source code and by experiment). How is this │
>>        │useful?                                              │
> 
> You're saying it should just do EPOLLOUT and not EPOLLWRNORM? Seems
> reasonable.

No, I'm saying something more fundamental: why is the FD indicating as
writable? Can you write something to it? If yes, what? If not, then
why do these APIs want to say that the FD is writable?

>> EXAMPLES
>>        The (somewhat contrived) program shown below demonstrates the use
> 
> May also be worth mentioning the example in
> samples/seccomp/user-trap.c as well.

Oh -- I meant to do that! Thanks for the reminding me.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


More information about the Containers mailing list