Controlling devices and device namespaces

Sat Sep 15 09:27:51 UTC 2012

"Serge E. Hallyn" <serge at hallyn.com> writes:

> Quoting Aristeu Rozanski (aris at ruivo.org):
>> Tejun,
>> On Thu, Sep 13, 2012 at 01:58:27PM -0700, Tejun Heo wrote:
>> >   memcg can be handled by memcg people and I can handle cgroup_freezer
>> >   and others with help from the authors.  The problematic one is
>> >   blkio.  If anyone is interested in working on blkio, please be my
>> >   guest.  Vivek?  Glauber?
>> 
>> if Serge is not planning to do it already, I can take a look in device_cgroup.
>
> That's fine with me, thanks.
>
>> also, heard about the desire of having a device namespace instead with
>> support for translation ("sda" -> "sdf"). If anyone see immediate use for
>> this please let me know.
>
> Before going down this road, I'd like to discuss this with at least you,
> me, and Eric Biederman (cc:d) as to how it relates to a device
> namespace.

The problem with devices.

- An unrestricted mknod gives you access to effectively any device in
  the system.

- During process migration if the device number changes using
  stat to file descriptors can fail on the same file descriptor.

- Devices coming from prexisting filesystems that we mount
  as unprivileged users are as dangerous as mknod but show
  that the problem is not limited to mknod.

- udev thinks mknod is a system call we can remove from the kernel.

---

The use cases seem comparitively simple to enumerate.

- Giving unfiltered access to a device to someone not root.

- Virtual devices that everyone uses and have no real privilege
  requirements: /dev/null /dev/tty /dev/zero etc.

- Dynamically created devices /dev/loopN /dev/tun /dev/macvtapN,
  nbd, iscsi, /dev/ptsN, etc

---

There are a couple of solution to these problems.

- The classic solution of creating a /dev for a container
  before starting it.

- The devpts filesystem.  This works well for unprivileged access
  to ptys.  Except for the /dev/ptmx sillines I very like how
  things are handled today with devpts.

- Device control groups.  I am not quite certain what to make
  of them.  The only case I see where they are better than
  a prebuilt static dev is if there is a hotppluged device
  that I want to push into my container.

  I think the only problem with device control groups and
  hierarchies is that removing a device from a whitelist
  does not recurse down the hierarchy.

  Can a process inside of a device control group create
  a child group that has access to a subset of it's
  devices?  The actually checks don't need to be hierarchical
  but the presence of device nodes should be.

---

I see a couple of holes in the device control picture.

- How do we handle hotplug events?

  I think we can do this by relaying events trough userspace,
  upating the device control groups etc.

- Unprivileged processess interacting with all of this.
  (possibly with privilege in their user namespace)
  What I don't know how to do is how to create a couple of different
  subhierarchies each for different child processes.

- Dynamically created devices.

  My gut feel is that we should replicate the success of devpts
  and give each type of dynamically created device it's own
  filesystem and mount point under /dev, and just bend
  the handful of userspace users into that model.

- Sysfs

  My gut says for the container use case we should aim to
  simply not have dynamically created devices in sysfs
  and then we can simply not care.

- Migration

  Either we need block device numbers that can migrate with us,
  (possibly a subset of the entire range ala devpts) or we need to send
  hotplug events to userspace right after a migration so userspace
  processes that care can invalidate their caches of stat data.

---

With the code in my userns development tree I can create a user
namespace, create a new mount namespace, and then if I have
access to any block devices mount filesystems, all without
needing to have any special privileges.  What I haven't
figured out is what it would take to get the the device
control group into the middle that.

It feels like it should be possible to get the checks straight
and use the device control group hooks to control which devices
are usable in a user namespace.  Unfortunately when I try and work
it out the independence of the user namespace and the device
control group seem to make that impossible.

Shrug there is most definitely something missing from our
model on how to handle devices well.  I am hoping we can
sprinkling some devpts derived pixie dust at the problem
migrate userspace to some new interfaces and have life
be good.

Eric