[Bugme-new] [Bug 6757] New: repeated slight XFS corruption

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Tue Jun 27 15:14:20 PDT 2006


http://bugzilla.kernel.org/show_bug.cgi?id=6757

           Summary: repeated slight XFS corruption
    Kernel Version: 2.6.17.1
            Status: NEW
          Severity: high
             Owner: xfs-masters at oss.sgi.com
         Submitter: Martin at Lichtvoll.de


Most recent kernel where this bug did not occur: 
Not easy to say, cause I had lots of XFS corruption problems in the last 
months. Most prominent kernel bug #6380. But I had XFS crash with 2.6.15.7 as 
well with *disabled write caches*

This slight kind of corruption however seems to be new with 2.6.17.1. I am not 
absolutely sure, but I think I did not have it with 2.6.16.11, 2.6.16.4 and 
2.6.15.7. Well better a slight corruption without lost+found files than what I 
had before ;)


Distribution: Debian Etch / Sid / Experiment (almost nothing from experimental)
Hardware Environment: IBM ThinkPad T23, Pentium III 1.13 GHz, 384 MB RAM (lspci 
stuff attached)
Software Environment: 2.6.17.1 + sws2 patches...
Problem Description:
XFS seems to get corrupted slightly after some days. Well, right now I only had 
it twice... once last friday and once today.


Steps to reproduce:
Frankly I have no idea. It just happens. But it only happened in the root 
partition. /home fortunately has been unaffected by this.


Details:

It first happened last Friday, see my post on linux-xfs mailinglist:

Re: xfs crash with linux 2.6.15.7 and disabled write caches (long)
Message-Id: <200606232201.29440.Martin at lichtvoll.de>

It again happened today and I have quite some diagnostic date at hand:

I halted the computer regularily after some suspend / resume cycles (software 
suspend 2), cause it seems this way I can detect filesystem corruption more 
easily (for example when KDE crashes upon shutdown has been a good indicator).

I got some kernel messages on tty0, which I found in /var/log/syslog later on. 
As a sample here the first occurence I found in the log - I attach the whole 
portion of the log file to this bug report:

---------------------------------------------------------------
Jun 27 23:34:00 deepdance shutdown[18694]: shutting down for system halt
Jun 27 23:34:02 deepdance kernel: 0x0: 00 00 00 7e 1f 69 00 00 17 62 03 00 ff 
ff 07 00 
Jun 27 23:34:02 deepdance kernel: Filesystem "hda5": XFS internal error 
xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c.  Caller 0xc020b60d
Jun 27 23:34:02 deepdance kernel: <c021ea5b> xfs_corruption_error+0x10b/0x140  
<c020b60d> xfs_da_read_buf+0x3d/0x50
Jun 27 23:34:02 deepdance kernel: <c024f7c1> kmem_zone_alloc+0x61/0xe0  
<c020a869> xfs_da_buf_make+0x159/0x160
Jun 27 23:34:02 deepdance kernel: <c020b4bb> xfs_da_do_buf+0x8bb/0x960  
<c020b60d> xfs_da_read_buf+0x3d/0x50
Jun 27 23:34:02 deepdance kernel: <c020b60d> xfs_da_read_buf+0x3d/0x50  
<c0214ebc> xfs_dir2_leaf_lookup_int+0x6c/0x2d0
Jun 27 23:34:02 deepdance kernel: <c0214ebc> 
xfs_dir2_leaf_lookup_int+0x6c/0x2d0  <c01f8193> 
xfs_bmap_last_offset+0x133/0x160
Jun 27 23:34:02 deepdance kernel: <c021564d> xfs_dir2_leaf_lookup+0x2d/0xc0  
<c021098a> xfs_dir2_lookup+0x13a/0x160
Jun 27 23:34:02 deepdance kernel: <c0148a16> 
generic_file_buffered_write+0x3b6/0x6e0  <c02435ac> 
xfs_dir_lookup_int+0x4c/0x150
Jun 27 23:34:02 deepdance kernel: <c017843f> do_lookup+0x5f/0x180  <c0247c9e> 
xfs_lookup+0x7e/0xc0
Jun 27 23:34:02 deepdance kernel: <c02576bc> xfs_vn_lookup+0x4c/0xa0  
<c0178533> do_lookup+0x153/0x180
Jun 27 23:34:02 deepdance kernel: <c0178dfd> __link_path_walk+0x89d/0xfa0  
<c0256e86> xfs_vn_permission+0x26/0x30
Jun 27 23:34:02 deepdance kernel: <c017955c> link_path_walk+0x5c/0x100  
<c0105d5a> do_gettimeofday+0x1a/0xd0
Jun 27 23:34:02 deepdance kernel: <c011d1ec> sys_gettimeofday+0x3c/0xb0  
<c0179a27> do_path_lookup+0xa7/0x270
Jun 27 23:34:02 deepdance kernel: <c017712f> getname+0xdf/0x110  <c017a26c> 
__user_walk_fd+0x3c/0x70
Jun 27 23:34:02 deepdance kernel: <c0166daa> sys_faccessat+0xfa/0x180  
<c0105d5a> do_gettimeofday+0x1a/0xd0
Jun 27 23:34:02 deepdance kernel: <c011d1ec> sys_gettimeofday+0x3c/0xb0  
<c0166e4f> sys_access+0x1f/0x30
Jun 27 23:34:02 deepdance kernel: <c0103027> syscall_call+0x7/0xb 
---------------------------------------------------------------

Write caches have been disabled all the time:

---------------------------------------------------------------
root at deepdance:~ -> hdparm -I /dev/hda | grep -i "write cache"
                Write cache
(No asterik on front of it means disabled...)
---------------------------------------------------------------

Mount options are: defaults,barrier,logbufs=8

I booted into an SUSE 10.1 installation with SUSE kernel 2.6.16.13-4-default. 
Mount options for the Debian partition are the same as above. Write caches 
should have been disabled by an init script I wrote... strange is just the 
output of hdparm:

---------------------------------------------------------------
deepdance:~ # hdparm -W0 /dev/hda

/dev/hda:
 setting drive write-caching to 0 (off)
 HDIO_SET_WCACHE(wcache) failed: Success
---------------------------------------------------------------

(This doesn't happen under Debian. Maybe I should try blktool wcache off, I can 
test whether write cache has been successfully disabled under SUSE using 
hdparm -I /dev/hda as well, but I believe it is, also the barrier mount option 
is used, so things should be safe anyway)....

xfs_check reported errors like this (full output attached):

---------------------------------------------------------------
deepdance:~ # xfs_check /dev/hda5
bad free block nvalid/nused 6/-1 for dir ino 5012689 block 16777216
missing free index for data block 0 in dir ino 5012689
missing free index for data block 1 in dir ino 5012689
missing free index for data block 2 in dir ino 5012689
missing free index for data block 3 in dir ino 5012689
missing free index for data block 4 in dir ino 5012689
missing free index for data block 5 in dir ino 5012689
bad free block nvalid/nused 21/-1 for dir ino 33641428 block 16777216
---------------------------------------------------------------

xfs_repair was able to repair it and printed messages like this:

---------------------------------------------------------------
empty data block 53 in directory inode 55176185: junking block
empty data block 54 in directory inode 55176185: junking block
empty data block 56 in directory inode 55176185: junking block
empty data block 58 in directory inode 55176185: junking block
empty data block 60 in directory inode 55176185: junking block
empty data block 63 in directory inode 55176185: junking block
empty data block 64 in directory inode 55176185: junking block
free block 16777216 entry 52 for directory ino 55176185 bad
rebuilding directory inode 55176185
free block 16777216 for directory inode 48409589 bad nused
rebuilding directory inode 48409589
---------------------------------------------------------------

I already thought about hardware problems and tried

1) badblocks -s -v -n -o /home/martin/XFS-Probleme/badblocks.txt /dev/hda5

It found no bad blocks

2) smartctl -t long /dev/hda

It completed successfully

3) memtest86 over night

It found 0 errors.

So I am pretty sure that the hardware is well.

Regards, Martin

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Bugme-new mailing list