[patch 0/4] [RFC] Another proportional weight IO controller

Wed Nov 19 22:56:32 PST 2008

> From: Aaron Carroll <aaronc at gelato.unsw.edu.au>
> Date: Thu, Nov 20, 2008 03:45:02PM +1100
>
> Fabio Checconi wrote:
> >>> Fabio Checconi wrote:
> >>>>   - To detect hw tagging in BFQ we consider a sample valid iff the
> >>>>     number of requests that the scheduler could have dispatched (given
> >>>>     by cfqd->rb_queued + cfqd->rq_in_driver, i.e., the ones still into
> >>>>     the scheduler plus the ones into the driver) is higher than the
> >>>>     CFQ_HW_QUEUE_MIN threshold.  This obviously caused no problems
> >>>>     during testing, but the way CFQ uses now seems a little bit
> >>>>     strange.
> >>> BFQ's tag detection logic is broken in the same way that CFQ's used to
> >>> be.  Explanation is in this patch:
> >>>
> >> If you look at bfq_update_hw_tag(), the logic introduced by the patch
> >> you mention is still there; BFQ starts with ->hw_tag = 1, and updates it
> 
> Yes, I missed that.  So which part of CFQ's hw_tag detection is strange?
> 

I just think that is rather counterintuitive to consider invalid
a sample when you have, say, rq_in_driver = 1,2,3 or 4 and other
4 queued requests.  Considering the actual number of requests that
could have been dispatched seemed more straightforward than
considering the two values separately.

Anyway I think the validity of the samples is a minor issue, while the
throughput loss you experienced was a more serious one.

> >> every 32 valid samples.  What changed WRT your patch, apart from the
> >> number of samples, is that the condition for a sample to be valid is:
> >>
> >>   bfqd->rq_in_driver + bfqd->queued >= 5
> >>
> >> while in your patch it is:
> >>
> >>   cfqd->rq_queued > 5 || cfqd->rq_in_driver > 5
> >>
> >> We preferred the first one because that sum better reflects the number
> >> of requests that could have been dispatched, and I don't think that this
> >> is wrong.
> 
> I think it's fine too.  CFQ's condition accounts for a few rare situations,
> such as the device stalling or hw_tag being updated right after a bunch of
> requests are queued.  They are probably irrelevant, but can't hurt.
> 
> >> There is a problem, but it's not within the tag detection logic itself.
> >> From some quick experiments, what happens is that when a process starts,
> >> CFQ considers it seeky (*), BFQ doesn't.  As a side effect BFQ does not
> >> always dispatch enough requests to correctly detect tagging.
> >>
> >> At the first seek you cannot tell if the process is going to bee seeky
> >> or not, and we have chosen to consider it sequential because it improved
> >> fairness in some sequential workloads (the CIC_SEEKY heuristic is used
> >> also to determine the idle_window length in [bc]fq_arm_slice_timer()).
> >>
> >> Anyway, we're dealing with heuristics, and they tend to favor some
> >> workload over other ones.  If recovering this thoughput loss is more
> >> important than a transient unfairness due to short idling windows assigned
> >> to sequential processes when they start, I've no problems in switching
> >> the CIC_SEEKY logic to consider a process seeky when it starts.
> >>
> >> Thank you for testing and for pointing out this issue, we missed it
> >> in our testing.
> >>
> >>
> >> (*) to be correct, the initial classification depends on the position
> >>     of the first accessed sector.
> > 
> > Sorry, I forgot the patch...  This seems to solve the problem with
> > your workload here, does it work for you?
> 
> Yes, it works fine now :)
> 

Thank you very much for trying it.

> However, hw_tag detection (in CFQ and BFQ) is still broken in a few ways:
>   * If you go from queue_depth=1 to queue_depth=large, it's possible that
>     the detection logic fails.  This could happen if setting queue_depth
>     to a larger value at boot, which seems a reasonable situation.

I think that the transition of hw_tag from 1 to 0 can be quite easy, and
may depend only on the workload, while getting back to 1 is more difficult,
because when hw_tag is 0 there may be too few dispatches to detect queueing...

>   * It depends too much on the hardware.  If you have a seekly load on a
>     fast disk with a unit queue depth, idling sucks for performance (I
>     imagine this is particularly bad on SSDs).  If you have any disk with
>     a deep queue, not idling sucks for fairness.

Agreed.  This fairness vs. throughput conflict is very workload
dependent too.

> I suppose CFQ's slice_resid is supposed to help here, but as far as I can
> tell, it doesn't do a thing.
>