[Bugme-janitors] [Bug 10756] many pre-mature anticipation timeouts in anticipatory I/O scheduler
bugme-daemon at bugzilla.kernel.org
bugme-daemon at bugzilla.kernel.org
Tue May 20 02:13:18 PDT 2008
http://bugzilla.kernel.org/show_bug.cgi?id=10756
------- Comment #2 from anonymous at kernel-bugs.osdl.org 2008-05-20 02:13 -------
Reply-To: jens.axboe at oracle.com
On Mon, May 19 2008, Andrew Morton wrote:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Mon, 19 May 2008 23:29:41 -0700 (PDT) bugme-daemon at bugzilla.kernel.org wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=10756
> >
> > Summary: many pre-mature anticipation timeouts in anticipatory
> > I/O scheduler
> > Product: IO/Storage
> > Version: 2.5
> > KernelVersion: 2.6.23
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Block Layer
> > AssignedTo: axboe at kernel.dk
> > ReportedBy: chuanpengli at yahoo.com
> > CC: io_other at kernel-bugs.osdl.org
> >
> >
> > Latest working kernel version: N/A
> > Earliest failing kernel version: 2.6.13
> > Distribution: www.kernel.org
> > Hardware Environment: IBM eServer: dual 2G Xeon processors;IBM 36GB SCSI drive
> > Software Environment: Redhat 9: gcc 3.2.2
> > Problem Description:
> > Starting form 2.6.13, the switch of kernel timer frequency HZ from 1000 to
> > 250
> > results in "default_antic_expire = 1 tick". 1 tick is 4 ms, BUT the
> > anticipation timeout can occur anywhere from 0 to 4 ms, because the timer may
> > be started anytime from 0 to 4 ms before the next system timer interrupt. In
> > practice, I observe anticipation timeout as short as 100 micro-seconds using
> > the LTT trace tool. Compared with HZ=1000, the new frequency (HZ=250) causes
> > frequent pre-mature anticipation timeouts and degraded I/O throughput under
> > concurrent I/O workload. I suggest to set the "default_antic_expire" to 2 when
> > its value is calculated as 1. (see source "block/as-iosched.c")
> > Steps to reproduce:
> > (1) run a concurrent server with I/O-bound workload, such as a
> > micro-benchmark that sequentially reads 256 KB from random locations in
> > randomly chosen files.
> > (2) I/O throughput at HZ=250 is 10-15% lower than HZ=1000
> > (3) At HZ=250, a lot of anticipation timeouts can be observed using trace
> > tools such as LTT.
>
> Interesting.
>
> It's often a bug to do mod_timer(timer, jiffies+1) for this very reason
> - the timer can expire any time between one jiffie down to zero seconds
> hence, which is a large (infinite) ratio, which can have unpredictable
> effects.
>
> A probably-suitable-but-dopey fix might be
>
> --- a/block/as-iosched.c~a
> +++ a/block/as-iosched.c
> @@ -416,6 +416,9 @@ static void as_antic_waitnext(struct as_
>
> timeout = ad->antic_start + ad->antic_expire;
>
> + if (ad->antic_expire == 1)
> + timeout++; /* comment goes here */
> +
> mod_timer(&ad->antic_timer, timeout);
>
> ad->antic_status = ANTIC_WAIT_NEXT;
> _
>
> but a) It is unclear what in there prevents `timeout' from referring to
> a time which has already passed (say, there was a storm of slow-running
> onterrupts on this CPU) and b) I bet other IO schedulers have the same
> issue.
I have another patch pending that just makes sure that the timer
addition is always at least 2 for this very reason. CFQ needs a similar
patch, it currently makes sure it's at least 1 (but should be 2).
--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Bugme-janitors
mailing list