[Bugme-new] [Bug 17491] New: Reproducible crash on large 64bit write to sata device

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Mon Aug 30 11:27:32 PDT 2010


https://bugzilla.kernel.org/show_bug.cgi?id=17491

           Summary: Reproducible crash on large 64bit write to sata device
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 2.6.31 64bit, 2.6.35 64bit
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Serial ATA
        AssignedTo: jgarzik at pobox.com
        ReportedBy: carl.janzen at gmail.com
        Regression: No


Created an attachment (id=28461)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28461)
Ubuntu 9.10 livecd dmesg

This is a bug affecting recent 64bit kernels, including the kernel in Ubuntu
10.10 alpha 3  (kernel 2.6.35). The SMART data from the involved Brand new 2TB
Western Digital hard drive shows no errors (Motherboard is a brand new Asus P5Q
Pro Turbo). I tried it on an older Hard drive (also 2TB Western Digital) and
earlier motherboard (asus p5b) with the same results. 

This error likely affects other hard drives, or most likely has nothing to do
with the hard drives at all. I had a problem with a corruption of my 4-drive
array of 500MB Western Digital drives. Before rebuilding the array I wanted to
copy the data from those  drives across to a 2TB backup and that's when I
started seeing this reproducible crash. Leading up to this point the system did
experience crashes every other day or so, which suggests to me that the bug
probably caused that file system corruption also.

The kernel on the Fedora 10 live dvd does not crash ( 2.6.27 ) but I didn't
confirm whether it produces the messages as described below.

The kernel on Ubuntu 9.10 live cd does not crash either (2.6.31-14-generic ) ,
but produced the enclosed dmesg, messages, lspci and smartctl files. Judging by
the attempts to access blocks past the end of the device, it looks like a 64bit
specific problem. Convert the number to hex and it stands out conspicuously. 

The latest ubuntu distribution freezes up with keyboard LEDs flashing. I tried
to reproduce the problerm in text mode so I could take a picture of the
trace/panic. That's what the two JPGs are. 

The way I have been triggering the bug is with the following command

dd if=/dev/zero of=/dev/sdb1 bs=2048

There does not seem to be a predictable delay between the start of that command
and when it actually crashes/freezes or produces the errors in the log file.
Sometimes I can transfer 40GB before the error hits. Once it happened
immediately. I also noticed that upon detection of the device there is a
complaint "device reported invalid CHS sector 0"

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the Bugme-new mailing list