For review: seccomp_user_notif(2) manual page [v2]
Tycho Andersen
tycho at tycho.pizza
Thu Oct 29 14:16:28 UTC 2020
On Thu, Oct 29, 2020 at 05:43:35AM +0100, Jann Horn wrote:
> On Thu, Oct 29, 2020 at 3:04 AM Tycho Andersen <tycho at tycho.pizza> wrote:
> > On Thu, Oct 29, 2020 at 02:42:58AM +0100, Jann Horn wrote:
> > > On Mon, Oct 26, 2020 at 10:55 AM Michael Kerrisk (man-pages)
> > > <mtk.manpages at gmail.com> wrote:
> > > > static bool
> > > > getTargetPathname(struct seccomp_notif *req, int notifyFd,
> > > > char *path, size_t len)
> > > > {
> > > > char procMemPath[PATH_MAX];
> > > >
> > > > snprintf(procMemPath, sizeof(procMemPath), "/proc/%d/mem", req->pid);
> > > >
> > > > int procMemFd = open(procMemPath, O_RDONLY);
> > > > if (procMemFd == -1)
> > > > errExit("\tS: open");
> > > >
> > > > /* Check that the process whose info we are accessing is still alive.
> > > > If the SECCOMP_IOCTL_NOTIF_ID_VALID operation (performed
> > > > in checkNotificationIdIsValid()) succeeds, we know that the
> > > > /proc/PID/mem file descriptor that we opened corresponds to the
> > > > process for which we received a notification. If that process
> > > > subsequently terminates, then read() on that file descriptor
> > > > will return 0 (EOF). */
> > > >
> > > > checkNotificationIdIsValid(notifyFd, req->id);
> > > >
> > > > /* Read bytes at the location containing the pathname argument
> > > > (i.e., the first argument) of the mkdir(2) call */
> > > >
> > > > ssize_t nread = pread(procMemFd, path, len, req->data.args[0]);
> > > > if (nread == -1)
> > > > errExit("pread");
> > >
> > > As discussed at
> > > <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
> > > we need to re-check checkNotificationIdIsValid() after reading remote
> > > memory but before using the read value in any way. Otherwise, the
> > > syscall could in the meantime get interrupted by a signal handler, the
> > > signal handler could return, and then the function that performed the
> > > syscall could free() allocations or return (thereby freeing buffers on
> > > the stack).
> > >
> > > In essence, this pread() is (unavoidably) a potential use-after-free
> > > read; and to make that not have any security impact, we need to check
> > > whether UAF read occurred before using the read value. This should
> > > probably be called out elsewhere in the manpage, too...
> > >
> > > Now, of course, **reading** is the easy case. The difficult case is if
> > > we have to **write** to the remote process... because then we can't
> > > play games like that. If we write data to a freed pointer, we're
> > > screwed, that's it. (And for somewhat unrelated bonus fun, consider
> > > that /proc/$pid/mem is originally intended for process debugging,
> > > including installing breakpoints, and will therefore happily write
> > > over "readonly" private mappings, such as typical mappings of
> > > executable code.)
> > >
> > > So, uuuuh... I guess if anyone wants to actually write memory back to
> > > the target process, we'd better come up with some dedicated API for
> > > that, using an ioctl on the seccomp fd that magically freezes the
> >
> > By freeze here you mean a killable wait instead of an interruptible
> > wait, right?
>
> Nope, nonkillable.
>
> Consider the case of vfork(), where a target process does something like this:
>
> void spawn_executable(char **argv, char **envv) {
> pid_t child = vfork();
> if (child == 0) {
> char path[1000];
> sprintf(path, ...);
> execve(path, argv, envv);
> }
> }
>
> and the seccomp notifier wants to look at the execve() path (as a
> somewhat silly example). The child process is just borrowing the
> parent's stack, and as soon as the child either gets far enough into
> execve() or dies, the parent continues using that stack.
Ah ha, yes. Thanks for the explanation.
Tycho
More information about the Containers
mailing list