[PATCH RFC - TAKE TWO - 00/12] New version of the BFQ I/O Scheduler

Paolo Valente paolo.valente at unimore.it
Tue Jun 17 15:55:57 UTC 2014


Il giorno 02/giu/2014, alle ore 16:29, Jens Axboe <axboe at kernel.dk> ha scritto:

> On 2014-05-30 23:16, Tejun Heo wrote:
>>> for turning patch #2 into a series of changes for CFQ instead. We need to
>>> end up with something where we can potentially bisect our way down to
>>> whatever caused any given regression. The worst possible situation is "CFQ
>>> works fine for this workload, but BFQ does not" or vice versa. Or hangs, or
>>> whatever it might be.
>> 
>> It's likely that there will be some workloads out there which may be
>> affected adversely, which is true for any change really but with both
>> the core scheduling and heuristics properly characterized at least
>> finding a reasonable trade-off should be much less of a crapshoot and
>> the expected benefits seem to easily outweigh the risks as long as we
>> can properly sequence the changes.
> 
> Exactly, I think we are pretty much on the same page here. As I said in the previous email, the biggest thing I care about is not adding a new IO scheduler wholesale. If Paolo can turn the "add BFQ" patch into a series of patches against CFQ, then I would have no issue merging it for testing (and inclusion, when it's stable enough).

We have finished analyzing possible ways to turn cfq into bfq, and unfortunately
I think I need some help in this respect. In fact, we have found several, apparently
non-trivial issues. To describe them, I will start from some concrete examples, and
then try to discuss the overall problem in general terms, instead of providing a list
of all the issues we have found. I am sorry for providing however many details, but
I hope they will help me sync with you, and then make other boring emails needless.

First, supposing to start the transformation from adding the low-latency heuristics of bfq
to cfq's engine, one of the main issues is that cfq chooses the next queue to serve,
within a group and class, in a different way than bfq does. In fact, cfq first chooses the
sub-group of queues to serve according to a workload-based priority scheme, and then
performs a round-robin scheduling among the queues in the sub-group. This priority scheme
not only has nothing to do with the logic of the low-latency heuristics of bfq (and actually with
bfq altogether), but also conflicts with the freedom in choosing the next queue that these
heuristics need to succeed in guaranteeing a low latency.

If, on the opposite end, we assume, because of the above issue, to proceed the other
way round, i.e., to start from replacing the engine of cfq with that of bfq, then similar, if not worse,
issues arise:
- the internal scheduler of bfq is hierarchical, whereas the internal, round-robin-based scheduler
  of cfq is not
- the hierarchy-flattening scheme adopted in cfq has no counterpart in the hierarchical scheduling
  algorithm of bfq
- preemption is not trivial to implement in bfq in such a way that service guarantees are preserved,
  but, in the first place, would however be needed to keep a high throughput with interleaved I/O
- cfq uses the workload-based queue selection scheme I mentioned above, and this has no match
  with any mechanism in bfq
...

Instead of bothering you with the full list of issues, I want to try to describe the problem, in general
terms, through the following rough simplification (in which I am neglecting trivial common code
between cfq and bfq, such as handling of I/O contexts). On one side, bfq is made by 80% of a
hierarchical fair-queueing scheduler, and by the remaining 20% of a set of heuristics to improve
some performance indexes. On the other side, cfq is made, roughly, by 40% of a simple round-robin
flat scheduler, and by the remaining 60%, of: an extension to support hierarchical scheduling,
workload-based improvements, preemption, virtual-time extensions, further low-latency mechanisms,
and so on. This remaining 60% of cfq has practically very little in common with the above 20% of
heuristics in bfq (although many of the goals of these parts are the same). Probably, commonalities
amount to at most a 10%. The problem is then the remaining, almost completely incompatible, 90%
of non-common mechanisms.

To make a long story short, to implement a smooth transition from cfq to bfq, this 90% should of
course be progressively transformed along the way. This would apparently imply that:
- the intermediate versions would not be partial versions of either cfq or bfq;
- the performance of these intermediate versions would most certainly be worse than that of both
  cfq and bfq, as the mechanisms of the latter schedulers have been fine-tuned over the years,
  whereas the hybrid mechanisms in the intermediate versions would just be an attempt to avoid
  abrupt changes;
- these hybrid mechanisms would likely be more complex than the original ones;
- in the final steps of the transformation, these hybrid mechanisms will all have to be further
  changed to become those of bfq, or just thrown away.

In the end, a smooth transition seems messy and confusing. On the opposite end, I thought about
a cleaner but sharper solution, which probably better matches the one proposed by Tejun:
1) removing the 60% of extra code of cfq from around the round-robin engine of cfq, 2) turning the
remaining core into a flat version of bfq-v0, 3) turning this flat scheduler into the actual, hierarchical
bfq-v0, 4) applying the remaining bfq patches.

In general, with both a smooth but messy and a sharp but clean transformation, there seems to be
the following common problems:
1) The main benefits highlighted by Jens, i.e., being able to move back and forth and easily
understand what works and what does not, seem to be lost, because, with both solutions,
intermediate versions would likely have a worse performance than the current version of cfq.
2) bfq, on one side, does not export some of the sysfs parameters of cfq, such as slice_sync, and,
on the other side, uses other common parameters in a different way. For example, bfq turns I/O priorities
into throughput shares in a different way than cfq does. As a consequence, existing configurations may
break or behave in unexpected ways.

I’m sorry for the long list of (only) problems, but, because of the extent at which cfq and bfq have diverged
over the years, we are having a really hard time finding a sensible way to turn the former into the latter.
Of course, we are willing to do our best once we find a viable solution.

Thanks,
Paolo



More information about the Containers mailing list