[Bugme-new] [Bug 6757] New: repeated slight XFS corruption
bugme-daemon at bugzilla.kernel.org
bugme-daemon at bugzilla.kernel.org
Tue Jun 27 15:14:20 PDT 2006
http://bugzilla.kernel.org/show_bug.cgi?id=6757
Summary: repeated slight XFS corruption
Kernel Version: 2.6.17.1
Status: NEW
Severity: high
Owner: xfs-masters at oss.sgi.com
Submitter: Martin at Lichtvoll.de
Most recent kernel where this bug did not occur:
Not easy to say, cause I had lots of XFS corruption problems in the last
months. Most prominent kernel bug #6380. But I had XFS crash with 2.6.15.7 as
well with *disabled write caches*
This slight kind of corruption however seems to be new with 2.6.17.1. I am not
absolutely sure, but I think I did not have it with 2.6.16.11, 2.6.16.4 and
2.6.15.7. Well better a slight corruption without lost+found files than what I
had before ;)
Distribution: Debian Etch / Sid / Experiment (almost nothing from experimental)
Hardware Environment: IBM ThinkPad T23, Pentium III 1.13 GHz, 384 MB RAM (lspci
stuff attached)
Software Environment: 2.6.17.1 + sws2 patches...
Problem Description:
XFS seems to get corrupted slightly after some days. Well, right now I only had
it twice... once last friday and once today.
Steps to reproduce:
Frankly I have no idea. It just happens. But it only happened in the root
partition. /home fortunately has been unaffected by this.
Details:
It first happened last Friday, see my post on linux-xfs mailinglist:
Re: xfs crash with linux 2.6.15.7 and disabled write caches (long)
Message-Id: <200606232201.29440.Martin at lichtvoll.de>
It again happened today and I have quite some diagnostic date at hand:
I halted the computer regularily after some suspend / resume cycles (software
suspend 2), cause it seems this way I can detect filesystem corruption more
easily (for example when KDE crashes upon shutdown has been a good indicator).
I got some kernel messages on tty0, which I found in /var/log/syslog later on.
As a sample here the first occurence I found in the log - I attach the whole
portion of the log file to this bug report:
---------------------------------------------------------------
Jun 27 23:34:00 deepdance shutdown[18694]: shutting down for system halt
Jun 27 23:34:02 deepdance kernel: 0x0: 00 00 00 7e 1f 69 00 00 17 62 03 00 ff
ff 07 00
Jun 27 23:34:02 deepdance kernel: Filesystem "hda5": XFS internal error
xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c. Caller 0xc020b60d
Jun 27 23:34:02 deepdance kernel: <c021ea5b> xfs_corruption_error+0x10b/0x140
<c020b60d> xfs_da_read_buf+0x3d/0x50
Jun 27 23:34:02 deepdance kernel: <c024f7c1> kmem_zone_alloc+0x61/0xe0
<c020a869> xfs_da_buf_make+0x159/0x160
Jun 27 23:34:02 deepdance kernel: <c020b4bb> xfs_da_do_buf+0x8bb/0x960
<c020b60d> xfs_da_read_buf+0x3d/0x50
Jun 27 23:34:02 deepdance kernel: <c020b60d> xfs_da_read_buf+0x3d/0x50
<c0214ebc> xfs_dir2_leaf_lookup_int+0x6c/0x2d0
Jun 27 23:34:02 deepdance kernel: <c0214ebc>
xfs_dir2_leaf_lookup_int+0x6c/0x2d0 <c01f8193>
xfs_bmap_last_offset+0x133/0x160
Jun 27 23:34:02 deepdance kernel: <c021564d> xfs_dir2_leaf_lookup+0x2d/0xc0
<c021098a> xfs_dir2_lookup+0x13a/0x160
Jun 27 23:34:02 deepdance kernel: <c0148a16>
generic_file_buffered_write+0x3b6/0x6e0 <c02435ac>
xfs_dir_lookup_int+0x4c/0x150
Jun 27 23:34:02 deepdance kernel: <c017843f> do_lookup+0x5f/0x180 <c0247c9e>
xfs_lookup+0x7e/0xc0
Jun 27 23:34:02 deepdance kernel: <c02576bc> xfs_vn_lookup+0x4c/0xa0
<c0178533> do_lookup+0x153/0x180
Jun 27 23:34:02 deepdance kernel: <c0178dfd> __link_path_walk+0x89d/0xfa0
<c0256e86> xfs_vn_permission+0x26/0x30
Jun 27 23:34:02 deepdance kernel: <c017955c> link_path_walk+0x5c/0x100
<c0105d5a> do_gettimeofday+0x1a/0xd0
Jun 27 23:34:02 deepdance kernel: <c011d1ec> sys_gettimeofday+0x3c/0xb0
<c0179a27> do_path_lookup+0xa7/0x270
Jun 27 23:34:02 deepdance kernel: <c017712f> getname+0xdf/0x110 <c017a26c>
__user_walk_fd+0x3c/0x70
Jun 27 23:34:02 deepdance kernel: <c0166daa> sys_faccessat+0xfa/0x180
<c0105d5a> do_gettimeofday+0x1a/0xd0
Jun 27 23:34:02 deepdance kernel: <c011d1ec> sys_gettimeofday+0x3c/0xb0
<c0166e4f> sys_access+0x1f/0x30
Jun 27 23:34:02 deepdance kernel: <c0103027> syscall_call+0x7/0xb
---------------------------------------------------------------
Write caches have been disabled all the time:
---------------------------------------------------------------
root at deepdance:~ -> hdparm -I /dev/hda | grep -i "write cache"
Write cache
(No asterik on front of it means disabled...)
---------------------------------------------------------------
Mount options are: defaults,barrier,logbufs=8
I booted into an SUSE 10.1 installation with SUSE kernel 2.6.16.13-4-default.
Mount options for the Debian partition are the same as above. Write caches
should have been disabled by an init script I wrote... strange is just the
output of hdparm:
---------------------------------------------------------------
deepdance:~ # hdparm -W0 /dev/hda
/dev/hda:
setting drive write-caching to 0 (off)
HDIO_SET_WCACHE(wcache) failed: Success
---------------------------------------------------------------
(This doesn't happen under Debian. Maybe I should try blktool wcache off, I can
test whether write cache has been successfully disabled under SUSE using
hdparm -I /dev/hda as well, but I believe it is, also the barrier mount option
is used, so things should be safe anyway)....
xfs_check reported errors like this (full output attached):
---------------------------------------------------------------
deepdance:~ # xfs_check /dev/hda5
bad free block nvalid/nused 6/-1 for dir ino 5012689 block 16777216
missing free index for data block 0 in dir ino 5012689
missing free index for data block 1 in dir ino 5012689
missing free index for data block 2 in dir ino 5012689
missing free index for data block 3 in dir ino 5012689
missing free index for data block 4 in dir ino 5012689
missing free index for data block 5 in dir ino 5012689
bad free block nvalid/nused 21/-1 for dir ino 33641428 block 16777216
---------------------------------------------------------------
xfs_repair was able to repair it and printed messages like this:
---------------------------------------------------------------
empty data block 53 in directory inode 55176185: junking block
empty data block 54 in directory inode 55176185: junking block
empty data block 56 in directory inode 55176185: junking block
empty data block 58 in directory inode 55176185: junking block
empty data block 60 in directory inode 55176185: junking block
empty data block 63 in directory inode 55176185: junking block
empty data block 64 in directory inode 55176185: junking block
free block 16777216 entry 52 for directory ino 55176185 bad
rebuilding directory inode 55176185
free block 16777216 for directory inode 48409589 bad nused
rebuilding directory inode 48409589
---------------------------------------------------------------
I already thought about hardware problems and tried
1) badblocks -s -v -n -o /home/martin/XFS-Probleme/badblocks.txt /dev/hda5
It found no bad blocks
2) smartctl -t long /dev/hda
It completed successfully
3) memtest86 over night
It found 0 errors.
So I am pretty sure that the hardware is well.
Regards, Martin
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Bugme-new
mailing list