[PATCH 03/24] io-controller: bfq support of in-class preemption

Vivek Goyal vgoyal at redhat.com
Tue Jul 28 08:03:10 PDT 2009


On Tue, Jul 28, 2009 at 04:29:06PM +0200, Jerome Marchand wrote:
> Vivek Goyal wrote:
> > On Tue, Jul 28, 2009 at 01:44:32PM +0200, Jerome Marchand wrote:
> >> Vivek Goyal wrote:
> >>> Hi Jerome,
> >>>
> >>> Thanks for testing it out. I could also reproduce the issue.
> >>>
> >>> I had assumed that RT queue will always preempt non-RT queue and hence if
> >>> there is an RT ioq/request pending, the sd->next_entity will point to
> >>> itself and any queue which is preempting it has to be on same service
> >>> tree.
> >>>
> >>> But in your test case it looks like that RT async queue is pending and 
> >>> there is some sync BE class IO going on. It looks like that CFQ allows
> >>> sync queue preempting async queue irrespective of class, so in this case
> >>> sync BE class reader will preempt async RT queue and that's where my
> >>> assumption is broken and we see BUG_ON() hitting.
> >>>
> >>> Can you please tryout following patch. It is a quick patch and requires
> >>> more testing. It solves the crash but still does not solve the issue of
> >>> sync queue always preempting async queues irrespective of class. In
> >>> current scheduler we always schedule the RT queue first (whether it be
> >>> sync or async). This problem requires little more thought.
> >> I've tried it: I can't reproduce the issue anymore and I haven't seen any
> >> other problem so far.
> >> By the way, what is the expected result regarding fairness among different
> >> groups when IO from different classes are run on each group? For instance,
> >> if we have RT IO going on on one group, BE IO on an other and Idle IO on a
> >> third group, what is the expected result: should the IO time been shared
> >> fairly between the groups or should RT IO have priority? As it is now, the
> >> time is shared fairly between BE and RT groups and the last group running
> >> Idle IO hardly get any time.
> >>
> > 
> > Hi Jerome,
> > 
> > If there are two groups RT and BE, I would expect RT group to get all the
> > bandwidth as long as it is backlogged and starve the BE group.
> 
> I wasn't clear enough. I meant the class of the process as set by ionice, not
> the class of the cgroup. That is, of course, only an issue when using CFQ.
> 
> > 
> > I ran quick test of two dd readers. One reader is in RT group and other is
> > in BE group. I do see that RT group runs away with almost all the BW.
> > 
> > group1 time=8:16 2479 group1 sectors=8:16 457848
> > group2 time=8:16 103  group2 sectors=8:16 18936
> > 
> > Note that when group1 (RT) finished it had got 2479 ms of disk time while
> > group2 (BE) got only 103 ms.
> > 
> > Can you send details of your test. It should not be fair sharing between
> > RT and BE group.
> 
> Setup:
> 
> $ mount -t cgroup -o io,blkio none /cgroup
> $ mkdir /cgroup/test1 /cgroup/test2 /cgroup/test3
> $ echo 1000 > /cgroup/test1/io.weight
> $ echo 1000 > /cgroup/test2/io.weight
> $ echo 1000 > /cgroup/test3/io.weight
> 
> Test:
> $ echo 3 > /proc/sys/vm/drop_caches
> 
> $ ionice -c 1 dd if=/tmp/io-controller-test3 of=/dev/null &
> $ echo $! > /cgroup/test1/tasks
> 
> $ ionice -c 2 dd if=/tmp/io-controller-test1 of=/dev/null &
> $ echo $! > /cgroup/test2/tasks
> 
> $ ionice -c 3 dd if=/tmp/io-controller-test2 of=/dev/null &
> $ echo $! > /cgroup/test3/tasks
> 

Ok, got it. So you have created three BE class groups and with-in those
groups you are running job of RT, BE and IDLE type.



More information about the Containers mailing list