[PATCH v4 1/4] seccomp: add a return code to trap to userspace

Tue Jun 26 02:00:56 UTC 2018

> On Jun 25, 2018, at 6:32 PM, Tycho Andersen <tycho at tycho.ws> wrote:
> 
>> On Sat, Jun 23, 2018 at 12:27:43AM +0200, Jann Horn wrote:
>>> On Fri, Jun 22, 2018 at 11:51 PM Kees Cook <keescook at chromium.org> wrote:
>>> 
>>>> On Fri, Jun 22, 2018 at 11:09 AM, Andy Lutomirski <luto at amacapital.net> wrote:
>>>> One possible extra issue: IIRC /proc/.../mem uses FOLL_FORCE, which is not what we want here.
>> 
>> Uuugh, I forgot about that.
>> 
>>>> How about just adding an explicit “read/write the seccomp-trapped task’s memory” primitive?  That should be easier than a “open mem fd” primitive.
>>> 
>>> Uuugh. Can we avoid adding another "read/write remote process memory"
>>> interface? The point of this series was to provide a lightweight
>>> approach to what should normally be possible via the existing
>>> seccomp+ptrace interface. I do like Jann's context idea, but I agree
>>> with Andy: it can't be a handle to /proc/$pid/mem, since it's
>>> FOLL_FORCE. Is there any other kind of process context id we can use
>>> for this instead of pid? There was once an idea of pid-fd but it never
>>> landed... This would let us get rid of the "id" in the structure too.
>>> And if that existed, we could make process_vm_*v() safer too (taking a
>>> pid-fd instead of a pid).
>> 
>> Or make a duplicate of /proc/$pid/mem that only differs in whether it
>> sets FOLL_FORCE? The code is basically already there... something like
>> this:
> 
> But we want more than just memory access, I think. rootfs access, ns
> fds, etc. all seem like they might be useful, and racy to open.
> 
> I guess I see two options: use the existing id and add something to
> seccomp() to ask if it's still valid or independent of this patchset
> add some kind of pid id :\
> 

I think we use the existing id / cookie / whatever and ask seccomp, or new syscalls, to do the requested operation. This is because we know the target task is in a very special stopping point. As a result, a seccomp-specific mechanism can do RCU-less fd modifications against a single-threaded target, can muck with things like struct cred, etc, while a more general interface can’t.

It might be nice to add a syscall with flags such that it could be used on ptrace-stopped targets later on. Something like:

access_remote_task(int fd, u64 id, u32 type, ...)

Where type is 16 bits of “id and fd is from seccomp” and 16 bits of “write memory” or such.