[Bugme-janitors] [Bug 24012] Commit g18733b0 hangs my LVM+raid10 system

Mon Nov 29 15:27:48 PST 2010

https://bugzilla.kernel.org/show_bug.cgi?id=24012

--- Comment #4 from Darrick J. Wong <djwong at us.ibm.com>  2010-11-29 23:27:45 ---
Just out of curiosity, was this setup working fine with 2.6.36?

The reason I ask is, it looks like various filesystem processes (jbd2,
fsck.ext4) are hanging up in md_flush_request, which is where md issues flush
requests to the underlying devices.  This sort of looks like the flush is
issued and then the drives fail to respond, causing the system to stop.

As I recall, in the 2.6.36 days, a flush request (then known as a barrier)
would cause the device's IO queue to be emptied out before the flush would be
sent to the drive.  In 2.6.37, that emptying behavior is gone, which means that
flushes and writes are intermixed in the command stream, which improves
performance.  The patch you tracked down enables md to send these flush
requests.  On your system, I see that NCQ is enabled, which permits the kernel
to issue multiple commands simultaneously.  However, it isn't unheard of for
there to exist drives that cannot handle simultaneous flush and write
requests[1].  Perhaps these Samsung drives of yours suffer a similar behavior. 
There might be a way to tell:

Does the hang behavior go away if you boot with libata.force=noncq ?

It might also be interesting to see the smartctl -a output of all four drives.

[1]
https://ata.wiki.kernel.org/index.php/Known_issues#Seagate_harddrives_which_time_out_FLUSH_CACHE_when_NCQ_is_being_used

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.