[PATCH v4 1/4] seccomp: add a return code to trap to userspace

Andy Lutomirski luto at amacapital.net
Tue Jun 26 02:00:56 UTC 2018



> On Jun 25, 2018, at 6:32 PM, Tycho Andersen <tycho at tycho.ws> wrote:
> 
>> On Sat, Jun 23, 2018 at 12:27:43AM +0200, Jann Horn wrote:
>>> On Fri, Jun 22, 2018 at 11:51 PM Kees Cook <keescook at chromium.org> wrote:
>>> 
>>>> On Fri, Jun 22, 2018 at 11:09 AM, Andy Lutomirski <luto at amacapital.net> wrote:
>>>> One possible extra issue: IIRC /proc/.../mem uses FOLL_FORCE, which is not what we want here.
>> 
>> Uuugh, I forgot about that.
>> 
>>>> How about just adding an explicit “read/write the seccomp-trapped task’s memory” primitive?  That should be easier than a “open mem fd” primitive.
>>> 
>>> Uuugh. Can we avoid adding another "read/write remote process memory"
>>> interface? The point of this series was to provide a lightweight
>>> approach to what should normally be possible via the existing
>>> seccomp+ptrace interface. I do like Jann's context idea, but I agree
>>> with Andy: it can't be a handle to /proc/$pid/mem, since it's
>>> FOLL_FORCE. Is there any other kind of process context id we can use
>>> for this instead of pid? There was once an idea of pid-fd but it never
>>> landed... This would let us get rid of the "id" in the structure too.
>>> And if that existed, we could make process_vm_*v() safer too (taking a
>>> pid-fd instead of a pid).
>> 
>> Or make a duplicate of /proc/$pid/mem that only differs in whether it
>> sets FOLL_FORCE? The code is basically already there... something like
>> this:
> 
> But we want more than just memory access, I think. rootfs access, ns
> fds, etc. all seem like they might be useful, and racy to open.
> 
> I guess I see two options: use the existing id and add something to
> seccomp() to ask if it's still valid or independent of this patchset
> add some kind of pid id :\
> 

I think we use the existing id / cookie / whatever and ask seccomp, or new syscalls, to do the requested operation. This is because we know the target task is in a very special stopping point. As a result, a seccomp-specific mechanism can do RCU-less fd modifications against a single-threaded target, can muck with things like struct cred, etc, while a more general interface can’t.

It might be nice to add a syscall with flags such that it could be used on ptrace-stopped targets later on. Something like:

access_remote_task(int fd, u64 id, u32 type, ...)

Where type is 16 bits of “id and fd is from seccomp” and 16 bits of “write memory” or such.


More information about the Containers mailing list