[Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O

Linus Walleij linus.walleij at linaro.org
Fri Sep 16 11:24:07 UTC 2016


On Fri, Sep 16, 2016 at 11:10 AM, Bart Van Assche
<bart.vanassche at sandisk.com> wrote:

> What was your reference when comparing blk-mq MMC/SD performance with the
> current implementation?

I have *NOT* compared the performance, since I did not
manage to replace blk with blk mq in MMC/SD yet.

If someone else has more experience and can do this in
5 minutes to get a rough measure I would appreciate to
see it.

I am working on it from the bottom up, trying to make
a not too stupid search/and/substitute replacement. As MMC
is doing a lot of stacking requests and looking ahead and behind
and what not, this needs to be done thoroughly.

But this is the reference tests I have used for CFQ vs BFQ
comparisons so far:

Hardware:
- ARM Integrator/AP IM-PD1 SD-card at 300kHz (!)
- Ux500 with 7.18GiB eMMC
- Ux500 with SanDisk 4GiB uSD card
- ARM Juno with 2GiB Kingston uSD card
- ARM Juno with SanDisk  4GiB uSD card
- Marvell Kirkwood Feroceon ARM with 2GiB SD card

First the standard dd-write/read test of course, because if you
have performance issues there you can just forget about everything
else. Looks something like:
time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024 iflag=direct

That is with busybox dd/time.

Then I used iozone which is something the mobile industry had
traditionally used to provide some figures on storage throughput,
as many just want a figure to put on their whitepaper, they use
iozone, which will read and write a number of blocks of varying
size, re-read it, re-write it and also perform reads and writes
at random offsets:
http://www.iozone.org/

I just usually use it like so:
mount /dev/mmcblk0p1 /mnt
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test

Both of these are simple to cross compile and run from an
initramfs on ARM targets.

Then I use Jens Axboe's fio. This is a more complicated beast
intended to generate real-world workloads to emulate the load
on your random Google or Facebook database server or image
cluser or Idon'tknowwhat.
https://github.com/axboe/fio

It is not super-useful on MMC/SD cards, because the load
will simply bog down everything and your typical embedded
system will start to behave like an updating Android phone
"optimizing applications" which is a known issue that is
caused by the slowness of eMMC. It also eats memory
quickly and that way just kills any embedded system because
of OOM before you can make any meaningful tests. But it
can spawn any number of readers & writers and stress out
your device very efficiently if you have enough memory
and CPU. (It is apparently designed to test systems with
lots of memory and CPU power.)

I mainly used fio on NAS type devices.
For example on Marvell Kirkwood Pogoplug 4 with SATA, I
can do a test like this to test an dmcrypt devicemapper thing:

fio --filename=/dev/dm-0 --direct=1 --iodepth=1 --rw=read --bs=64K \
--size=1G --group_reporting --numjobs=1 --name=test_read

> Which I/O scheduler was used when measuring
> performance with the traditional block layer?

I used CFQ, deadline, noop, and of course the BFQ patches.
With BFQ I reproduced the figures reported by Paolo on a
laptop but since his test cases use fio to stress the system
and eMMC/SD are so slow, I couldn't come up with any good
usecase using fio.

Any hints on better tests are welcome!
In the kernel logs I only see peole doing a lot of dd
tests which I think is silly, you need more serious
test cases so it's good if we can build some consensus
there.

What do you guys at SanDisk use?

Yours,
Linus Walleij


More information about the Ksummit-discuss mailing list