[Bugme-janitors] [Bug 10756] many pre-mature anticipation timeouts in anticipatory I/O scheduler

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Mon May 19 23:46:13 PDT 2008


http://bugzilla.kernel.org/show_bug.cgi?id=10756





------- Comment #1 from anonymous at kernel-bugs.osdl.org  2008-05-19 23:46 -------
Reply-To: akpm at linux-foundation.org

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 19 May 2008 23:29:41 -0700 (PDT) bugme-daemon at bugzilla.kernel.org
wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10756
> 
>            Summary: many pre-mature anticipation timeouts in anticipatory
>                     I/O scheduler
>            Product: IO/Storage
>            Version: 2.5
>      KernelVersion: 2.6.23
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Block Layer
>         AssignedTo: axboe at kernel.dk
>         ReportedBy: chuanpengli at yahoo.com
>                 CC: io_other at kernel-bugs.osdl.org
> 
> 
> Latest working kernel version: N/A
> Earliest failing kernel version: 2.6.13
> Distribution: www.kernel.org
> Hardware Environment: IBM eServer: dual 2G Xeon processors;IBM 36GB SCSI drive
> Software Environment: Redhat 9: gcc 3.2.2 
> Problem Description:
>   Starting form 2.6.13, the switch of kernel timer frequency HZ from 1000 to
> 250 
> results in "default_antic_expire = 1 tick". 1 tick is 4 ms, BUT the
> anticipation timeout can occur anywhere from 0 to 4 ms, because the timer may
> be started anytime from 0 to 4 ms before the next system timer interrupt. In
> practice, I observe anticipation timeout as short as 100 micro-seconds using
> the LTT trace tool. Compared with HZ=1000, the new frequency (HZ=250) causes
> frequent pre-mature anticipation timeouts and degraded I/O throughput under
> concurrent I/O workload. I suggest to set the "default_antic_expire" to 2 when
> its value is calculated as 1. (see source "block/as-iosched.c")
> Steps to reproduce: 
>   (1) run a concurrent server with I/O-bound workload, such as a
> micro-benchmark that sequentially reads 256 KB from random locations in
> randomly chosen files. 
>   (2) I/O throughput at HZ=250 is 10-15% lower than HZ=1000
>   (3) At HZ=250, a lot of anticipation timeouts can be observed using trace
> tools such as LTT.

Interesting.

It's often a bug to do mod_timer(timer, jiffies+1) for this very reason
- the timer can expire any time between one jiffie down to zero seconds
hence, which is a large (infinite) ratio, which can have unpredictable
effects.

A probably-suitable-but-dopey fix might be

--- a/block/as-iosched.c~a
+++ a/block/as-iosched.c
@@ -416,6 +416,9 @@ static void as_antic_waitnext(struct as_

        timeout = ad->antic_start + ad->antic_expire;

+       if (ad->antic_expire == 1)
+               timeout++;              /* comment goes here */
+
        mod_timer(&ad->antic_timer, timeout);

        ad->antic_status = ANTIC_WAIT_NEXT;
_

but a) It is unclear what in there prevents `timeout' from referring to
a time which has already passed (say, there was a storm of slow-running
onterrupts on this CPU) and b) I bet other IO schedulers have the same
issue.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-janitors mailing list