I/O bandwidth controller (was Re: Too many I/O controllerpatches)

Caitlin Bestler Caitlin.Bestler at neterion.com
Wed Aug 6 11:01:13 PDT 2008


Fernando Luis Vázquez Cao wrote:

> 
> *** Goals
>   1. Cgroups-aware I/O scheduling (being able to define arbitrary
> groupings of processes and treat each group as a single scheduling
> entity).
>   2. Being able to perform I/O bandwidth control independently on each
> device.
>   3. I/O bandwidth shaping.
>   4. Scheduler-independent I/O bandwidth control.
>   5. Usable with stacking devices (md, dm and other devices of that
> ilk).
>   6. I/O tracking (handle buffered and asynchronous I/O properly).
> 
> The list of goals above is not exhaustive and it is also likely to
> contain some not-so-nice-to-have features so your feedback would be
> appreciated.
> 

I'm following this discussion on the virtualization mailing list,
and I'm limiting my reply to virtualization-related mailing lists.

What strikes me about Fernando's RFC is that it is quite readable.
So much so that I understood most of it, and I really shouldn't have.
That is because it is mostly at the scope of a kernel scheduler,
something I have no reason to be involved with.

What is missing from the discussion is anything on how IO bandwidth
could be regulated in a virtualized environment, other than the objective
that cgroups should be able to be set at the scope of a container.

For a virtualized environment, it strikes me that there are a number
of additional objectives that need to be considered:

- A Backend (Dom0/DomD/whatever) that is providing block storage
  services to Guests needs to be able to provide QoS controls with
  or without the co-operation of the Guest kernels. However GOS
  involvement is valuable, because the backend on its own would treat
  all IO from a given guest as an undifferentiated glop. Particularly,
  to the extent that the improved algorithms under discussion in the
  RFC are implemented in the Linux kernel, how do they become hints
  when passed to the backend?
- In virtualized environments it is *more* likely that the underlying
  storage is networked. This means that the backend scheduler probably
  only has crude control over when I/O is actually done. It can only
  control when requests are issued. The target has more control over
  when the I/O actually occurs, especially for writes.
- When the storage is networked it also means that the scheduling will
  also be QoS regulated at the network traffic layer. The two QoS
  scheduling algorithms should not work across purposes, nor should
  system administrators be forced to configure two sets of policies
  to achieve a single result.
- Does it make sense to extend storage-related network bandwidth controls
  to deal with sub-division by device or Guest? Network bandwidth control
  is likely to give a rate/quota for networked storage traffic. Would it
  make sense to sub-divide that control based on cgroups/whatever?
- Should the Guest I/O scheduler be given a hint that although it *looks*
  like a SCSI device that it really is a virtualized storage device and
  that therefore the Guest probably should not spend that many CPU cycles
  trying to optimize the movements of a head that does not really exist?



More information about the Containers mailing list