[Bugme-janitors] [Bug 10756] many pre-mature anticipation timeouts in anticipatory I/O scheduler

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Tue May 20 02:13:18 PDT 2008


http://bugzilla.kernel.org/show_bug.cgi?id=10756





------- Comment #2 from anonymous at kernel-bugs.osdl.org  2008-05-20 02:13 -------
Reply-To: jens.axboe at oracle.com

On Mon, May 19 2008, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 19 May 2008 23:29:41 -0700 (PDT) bugme-daemon at bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=10756
> > 
> >            Summary: many pre-mature anticipation timeouts in anticipatory
> >                     I/O scheduler
> >            Product: IO/Storage
> >            Version: 2.5
> >      KernelVersion: 2.6.23
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Block Layer
> >         AssignedTo: axboe at kernel.dk
> >         ReportedBy: chuanpengli at yahoo.com
> >                 CC: io_other at kernel-bugs.osdl.org
> > 
> > 
> > Latest working kernel version: N/A
> > Earliest failing kernel version: 2.6.13
> > Distribution: www.kernel.org
> > Hardware Environment: IBM eServer: dual 2G Xeon processors;IBM 36GB SCSI drive
> > Software Environment: Redhat 9: gcc 3.2.2 
> > Problem Description:
> >   Starting form 2.6.13, the switch of kernel timer frequency HZ from 1000 to
> > 250 
> > results in "default_antic_expire = 1 tick". 1 tick is 4 ms, BUT the
> > anticipation timeout can occur anywhere from 0 to 4 ms, because the timer may
> > be started anytime from 0 to 4 ms before the next system timer interrupt. In
> > practice, I observe anticipation timeout as short as 100 micro-seconds using
> > the LTT trace tool. Compared with HZ=1000, the new frequency (HZ=250) causes
> > frequent pre-mature anticipation timeouts and degraded I/O throughput under
> > concurrent I/O workload. I suggest to set the "default_antic_expire" to 2 when
> > its value is calculated as 1. (see source "block/as-iosched.c")
> > Steps to reproduce: 
> >   (1) run a concurrent server with I/O-bound workload, such as a
> > micro-benchmark that sequentially reads 256 KB from random locations in
> > randomly chosen files. 
> >   (2) I/O throughput at HZ=250 is 10-15% lower than HZ=1000
> >   (3) At HZ=250, a lot of anticipation timeouts can be observed using trace
> > tools such as LTT.
> 
> Interesting.
> 
> It's often a bug to do mod_timer(timer, jiffies+1) for this very reason
> - the timer can expire any time between one jiffie down to zero seconds
> hence, which is a large (infinite) ratio, which can have unpredictable
> effects.
> 
> A probably-suitable-but-dopey fix might be
> 
> --- a/block/as-iosched.c~a
> +++ a/block/as-iosched.c
> @@ -416,6 +416,9 @@ static void as_antic_waitnext(struct as_
>  
>  	timeout = ad->antic_start + ad->antic_expire;
>  
> +	if (ad->antic_expire == 1)
> +		timeout++;		/* comment goes here */
> +
>  	mod_timer(&ad->antic_timer, timeout);
>  
>  	ad->antic_status = ANTIC_WAIT_NEXT;
> _
> 
> but a) It is unclear what in there prevents `timeout' from referring to
> a time which has already passed (say, there was a storm of slow-running
> onterrupts on this CPU) and b) I bet other IO schedulers have the same
> issue.

I have another patch pending that just makes sure that the timer
addition is always at least 2 for this very reason. CFQ needs a similar
patch, it currently makes sure it's at least 1 (but should be 2).


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-janitors mailing list