[Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O

Fri Sep 16 08:59:54 UTC 2016

On Fri, Sep 16, 2016 at 10:24 AM, Greg KH <gregkh at linuxfoundation.org> wrote:
> On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote:
>> Linux systems suffers from long-standing high-latency problems, at
>> system and application level, related to I/O.  For example, they
>> usually suffer from poor responsiveness--or even starvation, depending
>> on the workload--while, e.g., one or more files are being
>> read/written/copied.  On a similar note, background workloads may
>> cause audio/video playback/streaming to stutter, even with long gaps.
>> A lot of test results on this problem can be found here [1] (I'm
>> citing only this resource just because I'm familiar with it, but
>> evidence can be found in countless technical reports, scientific
>> papers, forum discussions, and so on).
>
> <snip>
>
> Isn't this a better topic for the Vault conference, or the storage mini
> conference?

Paolo was invited to the kernel summit and I guess so are the
core block maintainers: Jens, Tejun, Christoph. The right people are
there so why not take the opportunity.

If for nothing else just have a formal chat.

Overall I personally think the most KS-related discussion would be
to address the problems Paolo has had to break into the block layer
development community and the conflicting responses to the patch
sets, which generated a few flak comments under the last LWN
article:
http://lwn.net/Articles/674308/

The main problem is that unlike some random driver this cannot
be put into staging and adding it as a secondary (or tertiary or
whatever) scheduling policy in block/* was explicitly nixed.

AFAICT there is no clear answer from the block maintainers
regarding:

- Is the old blk layer deprecated or not? Christoph seems to
  say "yes, forget it, work on mq", but I am still unsure about Jens
  and Tejuns positions here. Would be nice with some consensus.
  If it is deprecated it would make sense not to merge any new
  code using it, right?

- When is an all-out transition to mq really going to happen?
  "When it's ready and all blk consumers are migrated" is a good
  answer, but pretty unhelpful for developers like Paolo.
  Can we get a clearer picture?

- What will subsystems (especially my pet peeve about MMC/SD
  which is single-queue by nature) that experience a performance
  regression with a switch to mq do? Not switch until mq has a
  scheduling policy? Switch and suck up the performance regression,
  multiplied by the number of Android handheld devices on the
  planet?

I only have handwavy arguments about the latter being the
case which is why I'm working on a patch to MMC/SD to
switch to mq as an RFT. It's taking some time though, alas
I'm not very smart.

Yours,
Linus Walleij