[Bugme-new] [Bug 10781] New: unresponsive system (unfair io scheduling) when using dm-crypt

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Fri May 23 08:29:12 PDT 2008


http://bugzilla.kernel.org/show_bug.cgi?id=10781

           Summary: unresponsive system (unfair io scheduling) when using
                    dm-crypt
           Product: IO/Storage
           Version: 2.5
     KernelVersion: 2.6.22.19 (kernel.org) [, 2.6.24-1-amd64 (Debian)]
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: LVM2/DM
        AssignedTo: agk at redhat.com
        ReportedBy: christian-bko at jaeger.mine.nu


Latest working kernel version: -
Earliest failing kernel version: -
Distribution: Debian testing
Hardware Environment: Lenovo Thinkpad T61, 2.5Ghz Core2 Duo T9300,
    intel chipset, SATA disk, 2 GB RAM, NVidia video
Software Environment: Gnome / console
Problem Description:
    All tasks accessing dm-crypt'ed disk space become unresponsive for
    long periods of time when one i/o intensive (linear access) task is
    running on dm-crypt.

Steps to reproduce:

    (obviously replace sda9 with a partition where you don't have any
    valuable data)

     # cryptsetup create sda9_crypt /dev/sda9
     # time nice nice cat /dev/zero >/dev/mapper/sda9_crypt

    or:

     # cryptsetup remove sda9_crypt  # if necessary
     # cryptsetup luksFormat /dev/sda9
     # cryptsetup luksOpen /dev/sda9 sda9_crypt
     # time nice nice cat /dev/zero >/dev/mapper/sda9_crypt

    then after waiting some 10 seconds or so (until most binaries are
    dropped from the disk caches) try to start any program. Or
    e.g. "killall -STOP cat" will take 3-6 minutes.


More complete/wordy description follows:

I did install the system with the root fs (reiserfs) and swap on LVM
on dm-crypt, by using the Debian installer's ability to do so.

Quite soon I discovered that when I ran a compilation with make -j3 of
a software which requires hundreds of MB of RAM per gcc instance, and
thus touching swap during the compilation, that xorg (then using the
open "nv" driver) almost froze. The vesa driver did show far better
behaviour, so I did report that as a bug against the nv driver (but
soon found out that the closed-source nvidia driver behaved the same)
here:

 http://bugs.freedesktop.org/show_bug.cgi?id=15716

But the longer I'm using this machine, I'm suspecting something bad is
going on in the I/O layer really (and probably is dm-crypt related)
and it's not really the fault of xorg at all. One thing I did notice
some time ago is that ionice -c 3 doesn't help at all reducing the
impact of a "cat /dev/sdaX" run on the responsiveness of the machine
(also experimenting with the different io schedulers didn't seem to
help). Also I've felt the need to set up resource limits with ulimit
-v to prevent casual runaway processes (I'm a user-space developer)
from swapping and taking me minutes each time to get back control.

Currently I'm running a pristine 2.6.22.19 from kernel.org (in 64bit
mode; I haven't tried 32bit kernels so far).

Today I've noticed that when running the above tests (writing zeroes
to sda9_crypt), cat is running merrily along, as I can see from the
"System Monitor" gnome panel applet it is using about half the cpu
power of one core (shown in bright blue), and displaying the rest as
wait time (dark blue), which is I think expected (some read benchmarks
with cat from the root partition device also showed about 50% usage of
one core with about 40MByte/sec throughput, which is actually the
native disk throughput).

*But* if I try to open for example a new gnome-terminal, or even just
want to run "killall -STOP cat" (even at the console (ctl-alt-f1 to an
existant root login)), this takes ages, more precisely about 3-6
minutes. If I just hit ctl-z from the gnome-terminal where I started
the above cat instance, it more or less instantly stops (which is
rather expected as the shell shouldn't have to access the disk for
that) and all the other pending actions are then being run
immediately.

So my impression is that any 'fairness' in io scheduling seems to be
completely broken when using dm-crypt. I suspect that there might be a
problem with multiple I/O jobs going on at once *all using dm-crypt*,
kind of like dm-crypt had it's own purely fifo order scheduler (with a
huge backlog) or something. This is consistent with people having
their root partition not on dmcrypt telling me that they don't see the
problem when trying the above cat tests.

I've tried renicing the kcryptd processes to priorities 0, 10 and 19
(default is -5), but only priority 10 did seem to make it any better
if at all. Also switching off the second core didn't help in this
case.

(I'm now considering moving everything off dm-crypt to get decent
system behaviour.)

Thanks,
Christian.



Some further data:

- someone asked whether I have DMA enabled, and whether my sata disk is in AHCI
or compat mode. I don't know how to enable DMA on sata disks but thought that
there's no need to do this manually; I've looked at the kernel logs, which say:

 May 10 04:50:09 novo kernel: scsi0 : ahci
 May 10 04:50:09 novo kernel: ata1: SATA max UDMA/133 cmd 0xffffc20000068100
ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 313

and

 May 10 04:50:09 novo kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl
300)
 May 10 04:50:09 novo kernel: ata1.00: ATA-8: FUJITSU MHY2250BH, 0084000D, max
UDMA/100
 May 10 04:50:09 novo kernel: ata1.00: 488397168 sectors, multi 16: LBA48 NCQ
(depth 31/32)
 May 10 04:50:09 novo kernel: ata1.00: configured for UDMA/100
 May 10 04:50:09 novo kernel: scsi 0:0:0:0: Direct-Access     ATA      FUJITSU
MHY2250B 0084 PQ: 0 ANSI: 5

- also I've been asked for vmstat data:

$ vmstat 1

 1  0 643308 744080   5756  95944    0    0     0     0  404  227  0  1 99  0
 1  4 643308 350156 202312  95968    0    0    64 118212  797 86879  0 44 39 17
 0  3 643308 256780 229636  95936    0    0     0 105692 1098  316  0 42  0 58
 1  3 643308 205284 266660  95964    0    0     0 36976  891  259  0 35  0 65
 0  4 643308 167772 301256  95964    0    0     0 34596  948  318  0 34  0 66

the second row is after I started the cat
then

 0  3 643308 134736 346312  96008    0    0     0 12288 1018  359  0 31  0 69
 1  2 643308 148028 346312  96008    0    0     0     0  928  251  0 30  0 70
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  2 643308 182980 346312  96008    0    0     0     0 1049  301  1 32  0 68
 0  3 643308 200612 346312  96008    0    0     0     0  948  272  1 32  0 67
 0  0 643308 330444 346312  96008    0    0     0    64  895  301  1 16 40 44
 0  0 643308 330476 346312  96008    0    0     0     0  398  195  1  1 97  0

the second row there is after I stopped it again.
this is without triggering the load of another program
(which would again make me have wait minutes to get back control)


- I've run

# smartctl -t short /dev/sda
# smartctl -a /dev/sda 2>&1|less
..
Device Model:     FUJITSU MHY2250BH
..
Num  Test_Description    Status                  Remaining  LifeTime(hours) 
LBA_of_first_error
# 1  Short offline       Completed without error       00%       725         -


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-new mailing list