[cgl_discussion] ATCA Requirements Take 2
sdake at mvista.com
Thu Sep 11 10:42:11 PDT 2003
On Thu, 2003-09-11 at 09:21, Demke, Torsten wrote:
> Hello Steve,
> we (my company) are currently developing ATCA systems
> with IPMI... Im following the CGL discussion
> normally passive, but I have a questions about your
> requiremnts (please see below).
> > Requirement: shutdown systemcall integrated with ATCA system
> > management
> > The Linux kernel shall ensure that the shutdown system call uses the
> > ATCA system management IPMI interface to power down the
> > system and light
> > the blue led.
> What do you mean with "system" here (chassis or one node/blade)?
> AFAIK a normal node board running Linux cannot send IPMI
> requests directly to its IPMI controller (IPMC).
> That means the shutdown process cannot poweroff the board
> and light the blue LED.
> OK - the node board could send a message ot the ShMC (over RMCP).
> The ShMC then could poweroff the board.
The system is the board on which the shutdown call is running.. I'll
clarify in take 3.
Every IPMI board I have ever seen with a few exceptions includes a KCS
interface which allows IPMI messages to be sent directly to the system
management controller on the node blade. I assume Force does the same
thing with their ATCA hardware (atleast the Force CPCI 735/736 has a KCS
It is critical that the OS be able to send IPMI commands directly to the
node blades and have those messages forwarded to the shelf controller
over IPMB for reasons explained below. If this is not the case, hotswap
really cannot work properly for shared devices.
> > Requirement: Multiple Host Syncronized Device Hotswap
> > When multiple hosts are using the same block or character
> > device, and a
> > user requests to remove the device, the device's blade wont be powered
> > off and if a blue led is available, lit, until all systems in the
> > collection of machines using the device have removed the device from
> > their respective Linux kernel data structures.
> Do you think that the IPMI controller (ShMC), that can poweroff
> the blade, has to communicate with the OS (Linux) running on
> all nodes/machines that use this blade?
While there may not be shared block/character devices that are used
today, I envision a time with ATCA will include FibreChannel, 3GIO, and
Infiniband in the backplane. Given these technologies, it makes sense
to assume that eventually these devices could provide shared access from
the node boards in the system (FibreChannel is a good example).
Because devices can be shared, the OS must absolutely be in control of
the hotswap operation after the user requests hotswap removal, because
it has references to the hardware to be hotswapped. If the device is
hotswapped without the OS on each node board using the device
controlling the hotswap operation (removing references in the kernel to
the device, synronizing removal of the device across all nodes before
power off), kernel crashes (and at a minimum pointless error recovery)
are likely to occur.
The IPMI controller doesn't have to be involved in this syncronization
of notification by each blade with a message stating "that it is done
with the device to be removed". The syncronization can be done using
some network protocol such as TCP/UDP.
The system could be implemented as follows:
node = processor blade
One node (with backup) has registration with each other node that a
device is in use.
User requests hotswap removal of a non-processor shared device in use by
one OS on each of several processor blades for multiple OS instances
(and references to devices).
Each node using that device receives hot swap removal request
kernel quieses I/O to device
kernel removes device from system
tells master node and backup node that registration is revoked
Master node and backup node keep track of registrations
if hotswap request pending and registrations = 0
power off/light blue led of the device.
if timeout occurs before all requests are received
power off light blue led of device
As you can see, the firmware of the IPMI subsystem cannot be in total
control of the hotswap, because it has no way to track which devices are
in use. This could be changed by adding messages to the IPMI ATCA
extensions. (I would definately recommend such a thing!!)
For removal of a blade with a processor, this is much simpler process
because the processor blade cannot be "shared". Since it can't be
shared, the blade itself can be in control of the hotswap operation.
The hotswap operation still must be managed by the OS, because the OS
knows how to properly shutdown the system (kill processes in an orderly
fashion, sync filesystems, unmount filesystems, etc) and request from
the IPMI controller that the device be blue led and powered off once an
orderly shutdown is complete.
I'd appreciate any comments on this scheme.. This is how MontaVista has
implemented a system similiar to ATCA in the past.
> Im just curious ;-)
More information about the cgl_discussion