[PATCH 01/10] Documentation
ryov at valinux.co.jp
Mon Mar 16 01:40:43 PDT 2009
> I have briefly looked at dm-ioband also and following were some of the
> concerns I had raised in the past.
> - Need of a dm device for every device we want to control
> - This requirement looks odd. It forces everybody to use dm-tools
> and if there are lots of disks in the system, configuation is
I don't think it's a pain. Could it be easily done by writing a small
> - It does not support hiearhical grouping.
I can implement hierarchical grouping to dm-ioband if it's really
necessary, but at this point, I don't think it's really necessary
and I want to keep the code simple.
> - Possibly can break the assumptions of underlying IO schedulers.
> - There is no notion of task classes. So tasks of all the classes
> are at same level from resource contention point of view.
> The only thing which differentiates them is cgroup weight. Which
> does not answer the question that an RT task or RT cgroup should
> starve the peer cgroup if need be as RT cgroup should get priority
> - Because of FIFO release of buffered bios, it is possible that
> task of lower priority gets more IO done than the task of higher
> - Buffering at multiple levels and FIFO dispatch can have more
> interesting hard to solve issues.
> - Assume there is sequential reader and an aggressive
> writer in the cgroup. It might happen that writer
> pushed lot of write requests in the FIFO queue first
> and then a read request from reader comes. Now it might
> happen that cfq does not see this read request for a long
> time (if cgroup weight is less) and this writer will
> starve the reader in this cgroup.
> Even cfq anticipation logic will not help here because
> when that first read request actually gets to cfq, cfq might
> choose to idle for more read requests to come, but the
> agreesive writer might have again flooded the FIFO queue
> in the group and cfq will not see subsequent read request
> for a long time and will unnecessarily idle for read.
I think it's just a matter of which you prioritize, bandwidth or
io-class. What do you do when the RT task issues a lot of I/O?
> - Task grouping logic
> - We already have the notion of cgroup where tasks can be grouped
> in hierarhical manner. dm-ioband does not make full use of that
> and comes up with own mechansim of grouping tasks (apart from
> cgroup). And there are odd ways of specifying cgroup id while
> configuring the dm-ioband device.
> IMHO, once somebody has created the cgroup hieararchy, any IO
> controller logic should be able to internally read that hiearchy
> and provide control. There should not be need of any other
> configuration utity on top of cgroup.
> My RFC patches had tried to get rid of this external
> configuration requirement.
The reason is that it makes bio-cgroup easy to use for dm-ioband.
But It's not a final design of the interface between dm-ioband and
> - Task and Groups can not be treated at same level.
> - Because at any second level solution we are controlling bio
> per cgroup and don't have any notion of which task queue bio
> belongs to, one can not treat task and group at same level.
> What I meant is following.
> / | \
> 1 2 A
> / \
> 3 4
> In dm-ioband approach, at top level tasks 1 and 2 will get 50%
> of BW together and group A will get 50%. Ideally along the lines
> of cpu controller, I would expect it to be 33% each for task 1
> task 2 and group A.
> This can create interesting scenarios where assumg task1 is
> an RT class task. Now one would expect task 1 get all the BW
> possible starving task 2 and group A, but that will not be the
> case and task1 will get 50% of BW.
> Not that it is critically important but it would probably be
> nice if we can maitain same semantics as cpu controller. In
> elevator layer solution we can do it at least for CFQ scheduler
> as it maintains separate io queue per io context.
I will consider following the CPU controller's manner when dm-ioband
supports hierarchical grouping.
> This is in general an issue for any 2nd level IO controller which
> only accounts for io groups and not for io queues per process.
> - We will end copying a lot of code/logic from cfq
> - To address many of the concerns like multi class scheduler
> we will end up duplicating code of IO scheduler. Why can't
> we have a one point hierarchical IO scheduling (This patchset).
More information about the Containers