[Bugme-janitors] [Bug 10756] many pre-mature anticipation timeouts in anticipatory I/O scheduler
bugme-daemon at bugzilla.kernel.org
bugme-daemon at bugzilla.kernel.org
Mon May 19 23:46:13 PDT 2008
http://bugzilla.kernel.org/show_bug.cgi?id=10756
------- Comment #1 from anonymous at kernel-bugs.osdl.org 2008-05-19 23:46 -------
Reply-To: akpm at linux-foundation.org
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Mon, 19 May 2008 23:29:41 -0700 (PDT) bugme-daemon at bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10756
>
> Summary: many pre-mature anticipation timeouts in anticipatory
> I/O scheduler
> Product: IO/Storage
> Version: 2.5
> KernelVersion: 2.6.23
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Block Layer
> AssignedTo: axboe at kernel.dk
> ReportedBy: chuanpengli at yahoo.com
> CC: io_other at kernel-bugs.osdl.org
>
>
> Latest working kernel version: N/A
> Earliest failing kernel version: 2.6.13
> Distribution: www.kernel.org
> Hardware Environment: IBM eServer: dual 2G Xeon processors;IBM 36GB SCSI drive
> Software Environment: Redhat 9: gcc 3.2.2
> Problem Description:
> Starting form 2.6.13, the switch of kernel timer frequency HZ from 1000 to
> 250
> results in "default_antic_expire = 1 tick". 1 tick is 4 ms, BUT the
> anticipation timeout can occur anywhere from 0 to 4 ms, because the timer may
> be started anytime from 0 to 4 ms before the next system timer interrupt. In
> practice, I observe anticipation timeout as short as 100 micro-seconds using
> the LTT trace tool. Compared with HZ=1000, the new frequency (HZ=250) causes
> frequent pre-mature anticipation timeouts and degraded I/O throughput under
> concurrent I/O workload. I suggest to set the "default_antic_expire" to 2 when
> its value is calculated as 1. (see source "block/as-iosched.c")
> Steps to reproduce:
> (1) run a concurrent server with I/O-bound workload, such as a
> micro-benchmark that sequentially reads 256 KB from random locations in
> randomly chosen files.
> (2) I/O throughput at HZ=250 is 10-15% lower than HZ=1000
> (3) At HZ=250, a lot of anticipation timeouts can be observed using trace
> tools such as LTT.
Interesting.
It's often a bug to do mod_timer(timer, jiffies+1) for this very reason
- the timer can expire any time between one jiffie down to zero seconds
hence, which is a large (infinite) ratio, which can have unpredictable
effects.
A probably-suitable-but-dopey fix might be
--- a/block/as-iosched.c~a
+++ a/block/as-iosched.c
@@ -416,6 +416,9 @@ static void as_antic_waitnext(struct as_
timeout = ad->antic_start + ad->antic_expire;
+ if (ad->antic_expire == 1)
+ timeout++; /* comment goes here */
+
mod_timer(&ad->antic_timer, timeout);
ad->antic_status = ANTIC_WAIT_NEXT;
_
but a) It is unclear what in there prevents `timeout' from referring to
a time which has already passed (say, there was a storm of slow-running
onterrupts on this CPU) and b) I bet other IO schedulers have the same
issue.
--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Bugme-janitors
mailing list