For review: seccomp_user_notif(2) manual page [v2]

Tycho Andersen tycho at tycho.pizza
Thu Oct 29 14:16:28 UTC 2020


On Thu, Oct 29, 2020 at 05:43:35AM +0100, Jann Horn wrote:
> On Thu, Oct 29, 2020 at 3:04 AM Tycho Andersen <tycho at tycho.pizza> wrote:
> > On Thu, Oct 29, 2020 at 02:42:58AM +0100, Jann Horn wrote:
> > > On Mon, Oct 26, 2020 at 10:55 AM Michael Kerrisk (man-pages)
> > > <mtk.manpages at gmail.com> wrote:
> > > >        static bool
> > > >        getTargetPathname(struct seccomp_notif *req, int notifyFd,
> > > >                          char *path, size_t len)
> > > >        {
> > > >            char procMemPath[PATH_MAX];
> > > >
> > > >            snprintf(procMemPath, sizeof(procMemPath), "/proc/%d/mem", req->pid);
> > > >
> > > >            int procMemFd = open(procMemPath, O_RDONLY);
> > > >            if (procMemFd == -1)
> > > >                errExit("\tS: open");
> > > >
> > > >            /* Check that the process whose info we are accessing is still alive.
> > > >               If the SECCOMP_IOCTL_NOTIF_ID_VALID operation (performed
> > > >               in checkNotificationIdIsValid()) succeeds, we know that the
> > > >               /proc/PID/mem file descriptor that we opened corresponds to the
> > > >               process for which we received a notification. If that process
> > > >               subsequently terminates, then read() on that file descriptor
> > > >               will return 0 (EOF). */
> > > >
> > > >            checkNotificationIdIsValid(notifyFd, req->id);
> > > >
> > > >            /* Read bytes at the location containing the pathname argument
> > > >               (i.e., the first argument) of the mkdir(2) call */
> > > >
> > > >            ssize_t nread = pread(procMemFd, path, len, req->data.args[0]);
> > > >            if (nread == -1)
> > > >                errExit("pread");
> > >
> > > As discussed at
> > > <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
> > > we need to re-check checkNotificationIdIsValid() after reading remote
> > > memory but before using the read value in any way. Otherwise, the
> > > syscall could in the meantime get interrupted by a signal handler, the
> > > signal handler could return, and then the function that performed the
> > > syscall could free() allocations or return (thereby freeing buffers on
> > > the stack).
> > >
> > > In essence, this pread() is (unavoidably) a potential use-after-free
> > > read; and to make that not have any security impact, we need to check
> > > whether UAF read occurred before using the read value. This should
> > > probably be called out elsewhere in the manpage, too...
> > >
> > > Now, of course, **reading** is the easy case. The difficult case is if
> > > we have to **write** to the remote process... because then we can't
> > > play games like that. If we write data to a freed pointer, we're
> > > screwed, that's it. (And for somewhat unrelated bonus fun, consider
> > > that /proc/$pid/mem is originally intended for process debugging,
> > > including installing breakpoints, and will therefore happily write
> > > over "readonly" private mappings, such as typical mappings of
> > > executable code.)
> > >
> > > So, uuuuh... I guess if anyone wants to actually write memory back to
> > > the target process, we'd better come up with some dedicated API for
> > > that, using an ioctl on the seccomp fd that magically freezes the
> >
> > By freeze here you mean a killable wait instead of an interruptible
> > wait, right?
> 
> Nope, nonkillable.
> 
> Consider the case of vfork(), where a target process does something like this:
> 
> void spawn_executable(char **argv, char **envv) {
>   pid_t child = vfork();
>   if (child == 0) {
>     char path[1000];
>     sprintf(path, ...);
>     execve(path, argv, envv);
>   }
> }
> 
> and the seccomp notifier wants to look at the execve() path (as a
> somewhat silly example). The child process is just borrowing the
> parent's stack, and as soon as the child either gets far enough into
> execve() or dies, the parent continues using that stack.

Ah ha, yes. Thanks for the explanation.

Tycho


More information about the Containers mailing list