[cgl_discussion] [cgl_valid] Simulating a system failure to f orce a filesystem rec overy

Howell, David P david.p.howell at intel.com
Wed Aug 7 11:34:42 PDT 2002

Back aways when I was doing disk driver work we used to do this by doing 
a 'dd if=/dev/zero of=/dev/raw_disk_device seek=offset_to_metadata_to_nail'.
If you know the offset of the metadata item (i.e. an inode) and the size 
it's easy to do the damage this way.

Actually, for what you are doing a script to create many hundred files with
enough duration to allow you to do the unsync'ed reboot should work, and
be more representative of a real system crash. One way to do this is a
tar archive extract to a directory, delete it, and repeat in a loop.
the metadata flux my doing several of these to different directories, so
there is a mix of creates/deletes/copies going on at the time of the reboot.

Another way to do the above is a kernel or other big application makes 
simultaneously, possibly with a mix of the tar extracts/deletes, etc.

Dave Howell

-----Original Message-----
From: Craig Thomas [mailto:craiger at osdl.org]
Sent: Wednesday, August 07, 2002 2:00 PM
To: Julie N Fleischer
Cc: 'cgl_discussion at osdl.org'
Subject: Re: [cgl_discussion] [cgl_valid] Simulating a system failure to
force a filesystem rec overy

Could you just forcibly corrupt an inode object or a superblock of a
particular file system through the use of a quick and dirty corruption
program that is executed just before you simulate the failure? 

On Wed, 2002-08-07 at 09:55, Fleischer, Julie N wrote:
> Validation -
> As part of testing a resilient file system, I want a test case where I am
> sure that I have simulated a system failure so that on startup fsck (I
> believe) must be performed.  In addition, it would be even better if that
> fsck could have to repair something (i.e., the system failure happened in
> the middle of a logical write).
> Does anyone know how I can do this reliably?
> Some possible solutions are:
> - Use something like the watchdog timer to reset the hardware. ==> Do you
> know if this will cause the system to think there was a system failure and
> an fsck must be performed?  Is there a way I can get this to happen in the
> middle of a logical write?
> - Cause a system crash as is done by things like kernel dump testing (ex.
> insert a module that crashes the system). ==> It seems like this can cause
> the system to run fsck.  Again, not sure how to guarantee this happens in
> the middle of a logical write, though.
> If anyone has more information on how I could do this, it would be greatly
> appreciated.
> Thanks.
> - Julie
> ----------------------
> Julie Fleischer
> Intel Corporation
> * 503-677-5700
> * julie.n.fleischer at intel.com
> _______________________________________________
> cgl_discussion mailing list
> cgl_discussion at lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/cgl_discussion
Craig Thomas                         phone: 503-626-2455  ext. 33
Open Source Development Labs         email: craiger at osdl.org
15275 SW Koll Pkwy, Suite H
Beaverton, OR  97006

cgl_discussion mailing list
cgl_discussion at lists.osdl.org

More information about the cgl_discussion mailing list