[Bugme-new] [Bug 10585] New: System Timer can come up bad

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Thu May 1 09:28:57 PDT 2008


http://bugzilla.kernel.org/show_bug.cgi?id=10585

           Summary: System Timer can come up bad
           Product: Timers
           Version: 2.5
     KernelVersion: 2.6.25
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: johnstul at us.ibm.com
        ReportedBy: charles.mitchell at solacesystems.com


Hardware: Intel S5000PAL/PSL motherboard, dual 5160 and dual E5450. 
          PCIe plugin card to IDT Switch

Software Environment:  

Problem Description:  We noticed that some of our systems were coming up with
bad system time, i.e. cpu_khz in /proc/cpuinfo was wrong and time was not being
correctly kept(losing several seconds per minute).  This problem happens more
often on some machines than others.  The problem is due to the fact that the
number of cpu cycles over 30 ms is occassionally too large and the cpu_khz
calculated in tsc.c ends up too large.  I suspect that an NMI is sneaking in,
delaying the time at which the OUT pin is sampled high (inb_p(0x61) & 0x20). 

The 2.6.25 code is the same as 2.6.18 for the sampling of the OUT pin.  

Test results:

   kernel           result
============      =================
2.6.18.solace     Fails approx 1 in 3 reboots    
2.6.18.8          Failed after 16 reboots
2.6.25            2 fails in 244 reboots

The interesting thing is that 2.6.25 had an extremely rare "cpu looks slow"
event.  Normally the cpu looks too fast because cpu_cycles are recorded while
the cpu is distracted with an NMI.  In the too slow case - the only way this
can happen is if the NMI happens at a very specific time - in between
programming the timer and reading the 'start' cycle count. So in this case we
know extactly how long the NMI was - it was 5.84 ms.

We plan on fixing by repeating the measurement some number of times and
selecting the best results.

bogomips are also independently calculated incorrectly - this is a separate
problem that I believe is of little consequence.

Steps to reproduce:   reboot and look at /proc.cpuinfo for incorrect CPU MHz.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-new mailing list