[PATCH 26/23] io-controller: fix writer preemption with in a group

Rik van Riel riel at redhat.com
Tue Sep 8 21:59:30 PDT 2009

Vivek Goyal wrote:
> o Found another issue during testing. Consider following hierarchy.
> 			root
> 			/ \
> 		       R1  G1
> 			  /\
> 			 R2 W
>   Generally in CFQ when readers and writers are running, reader immediately
>   preempts writers and hence reader gets the better bandwidth. In case of
>   hierarchical setup, it becomes little more tricky. In above diagram, G1
>   is a group and R1, R2 are readers and W is writer tasks.
>   Now assume W runs and then R1 runs and then R2 runs. After R2 has used its
>   time slice, if R1 is schedule in, after couple of ms, R1 will get backlogged
>   again in group G1, (streaming reader). But it will not preempt R1 as R1 is
>   also a reader and also because preemption across group is not allowed for
>   isolation reasons. Hence R2 will get backlogged in G1 and will get a 
>   vdisktime much higher than W. So when G2 gets scheduled again, W will get
>   to run its full slice length despite the fact R2 is queue on same service
>   tree.
>   The core issue here is that apart from regular preemptions (preemption 
>   across classes), CFQ also has this special notion of preemption with-in
>   class and that can lead to issues active task is running in a differnt
>   group than where new queue gets backlogged.
>   To solve the issue keep a track of this event (I am calling it late
>   preemption). When a group becomes eligible to run again, if late_preemption
>   is set, check if there are sync readers backlogged, and if yes, expire the
>   writer after one round of dispatch.
>   This solves the issue of reader not getting enough bandwidth in hierarchical
>   setups.
> Signed-off-by: Vivek Goyal <vgoyal at redhat.com>

Conceptually a nice solution.  The code gets a little tricky,
but I guess any code dealing with these situations would end
up that way :)

Acked-by: Rik van Riel <riel at redhat.com>

