[cgl_discussion] USE CASE - Block Device Removal

Steven Dake sdake at mvista.com
Mon Apr 18 14:11:23 PDT 2005


This is the block device removal use case.  Comments welcome.

Thanks
-steve


Description
Many new architectures, such as Advanced TCA, are being developed with
block devices that can physically be removed from the system. This
facilitates reduction of MTTR improving system availability. Block
device removal makes it possible for the Linux operating system to
continue to operate without expensive error recovery or worse, complete
system failure.

With ATCA or other modern embedded architectures, it is possible for the
operating system to be notified ahead of time that the device is to be
removed. This is achieved through a latch which takes at minimum 100
msec to activate. After the latch is activated, but before the block
device board is removed, the operating system accessing the block device
should remove all references to the device to avoid expensive error
recovery or worse, complete system failure.


Desired Outcome
When the block device remove operation is executed for a block device,
the Linux operating system will continue to operate without faults or
error recovery on a device that no longer exists. Further inserting a
device at the block device removal location allows the device to be
readded to the system.

An OSDL Special Interest Group (SIG) has been established for ongoing
discussions regarding common open source storage services. It is likely
that this group will define the common cluster services and drive
implementations into the kernel (where needed). See
http://maryedie/STORAGE_NETWORKING.


Participants/Roles
      * Linux Developers: Open source implementations exist of block
        device removal.
        
      * Application developers: Applications which require increased
        availability by ensuring that there is no service interruption
        during replacement of a block device.
        
        
Applications and services that can benefit from using block device
removal are applications where reducing MTTR through other means is not
possible. These systems then, must stay operational for as long as
possible without faulting in a forseen state.


Scenarios
An overheat warning is detected on a block device. The operators
determine that the device has operated for too long out of
specifications and decide to replace the device. The operators execute
block device removal on the operating systems using the block device.
Then the block device is physically removed from the system. During and
after the block device removal period the operating system remains
operational, although it is possible that some applications could fail
if they are using the block device.


Implementation Notes
It is believed by the specifications developers that to fully support
block device removal, the operations must be executed by the kernel.
There is no specific requirement that the operation is executed in the
kernel however.






More information about the cgl_discussion mailing list