RFC: I/O bandwidth controller (was Re: Too many I/O controller patches)

Andrea Righi righi.andrea at gmail.com
Mon Aug 11 13:52:25 PDT 2008

Fernando Luis Vázquez Cao wrote:
>>> This seems to be the easiest part, but the current cgroups
>>> infrastructure has some limitations when it comes to dealing with block
>>> devices: impossibility of creating/removing certain control structures
>>> dynamically and hardcoding of subsystems (i.e. resource controllers).
>>> This makes it difficult to handle block devices that can be hotplugged
>>> and go away at any time (this applies not only to usb storage but also
>>> to some SATA and SCSI devices). To cope with this situation properly we
>>> would need hotplug support in cgroups, but, as suggested before and
>>> discussed in the past (see (0) below), there are some limitations.
>>> Even in the non-hotplug case it would be nice if we could treat each
>>> block I/O device as an independent resource, which means we could do
>>> things like allocating I/O bandwidth on a per-device basis. As long as
>>> performance is not compromised too much, adding some kind of basic
>>> hotplug support to cgroups is probably worth it.
>>> (0) http://lkml.org/lkml/2008/5/21/12
>> What about using major,minor numbers to identify each device and account
>> IO statistics? If a device is unplugged we could reset IO statistics
>> and/or remove IO limitations for that device from userspace (i.e. by a
>> deamon), but pluggin/unplugging the device would not be blocked/affected
>> in any case. Or am I oversimplifying the problem?
> If a resource we want to control (a block device in this case) is
> hot-plugged/unplugged the corresponding cgroup-related structures inside
> the kernel need to be allocated/freed dynamically, respectively. The
> problem is that this is not always possible. For example, with the
> current implementation of cgroups it is not possible to treat each block
> device as a different cgroup subsytem/resource controlled, because
> subsystems are created at compile time.

The whole subsystem is created at compile time, but controller data
structures are allocated dynamically (i.e. see struct mem_cgroup for
memory controller). So, identifying each device with a name, or a key
like major,minor, instead of a reference/pointer to a struct could help
to handle this in userspace. I mean, if a device is unplugged a
userspace daemon can just handle the event and delete the controller
data structures allocated for this device, asynchronously, via
userspace->kernel interface. And without holding a reference to that
particular block device in the kernel. Anyway, implementing a generic
interface that would allow to define hooks for hot-pluggable devices (or
similar events) in cgroups would be interesting.

>>> 3. & 4. & 5. - I/O bandwidth shaping & General design aspects
>>> The implementation of an I/O scheduling algorithm is to a certain extent
>>> influenced by what we are trying to achieve in terms of I/O bandwidth
>>> shaping, but, as discussed below, the required accuracy can determine
>>> the layer where the I/O controller has to reside. Off the top of my
>>> head, there are three basic operations we may want perform:
>>>   - I/O nice prioritization: ionice-like approach.
>>>   - Proportional bandwidth scheduling: each process/group of processes
>>> has a weight that determines the share of bandwidth they receive.
>>>   - I/O limiting: set an upper limit to the bandwidth a group of tasks
>>> can use.
>> Use a deadline-based IO scheduling could be an interesting path to be
>> explored as well, IMHO, to try to guarantee per-cgroup minimum bandwidth
>> requirements.
> Please note that the only thing we can do is to guarantee minimum
> bandwidth requirement when there is contention for an IO resource, which
> is precisely what a proportional bandwidth scheduler does. An I missing
> something?

Correct. Proportional bandwidth automatically allows to guarantee min
requirements (instead of IO limiting approach, that needs additional
mechanisms to achive this).

In any case there's no guarantee for a cgroup/application to sustain
i.e. 10MB/s on a certain device, but this is a hard problem anyway, and
the best we can do is to try to satisfy "soft" constraints.


More information about the Containers mailing list