[RFC] x86: restrict pid namespaces to 32 or 64 bit syscalls

Vasiliy Kulikov segoon at openwall.com
Sun Aug 14 09:08:48 PDT 2011

(CC'ed Will Drewry, the author of new seccomp version, and 
containers list)

On Sun, Aug 14, 2011 at 17:27 +0200, Andi Kleen wrote:
> > i386 vs x86-64 vs x32 is just one of many axes along which syscalls can be restricted (and for that matter, one axis if backward compatibility), and it does not make sense to burden the code with ad hoc filters.  Designing a general filter facility which can be used to restrict any container to the subset of system calls it actually needs would make more sense, no?
> I believe this is already in the newer versions of seccomp.

The "newer versions of seccomp" are NAK'ed by Ingo.  AFAIU, Ingo wants
more generic filters to filter much more than syscalls.  But it
contradicts the security by simplicity, which we're trying to achieve
with this patch.

Compatibility syscalls are much more error prone than common syscalls
as they lack good testing or sometimes lack it at all, unfortunately.
The link I've posted is about a crazy bug - a completely uninitialized
structure was used in copy_from_user() function.  The function was not
tested _at all_.  I doubt any non-compatibility syscall (ioctl()
handler, etc.) can be completely untested.

Also we already have CONFIG_IA32_EMULATION, this patch only moves the
configuration mechanism from the compilation stage to the runtime stage,
it doesn't draw the new line.  It grants the permissions to use the
feature to some containers, but denies to other containers, which is an
rather expected property of containers separation.


Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

More information about the Containers mailing list