[cgl_discussion] RE: [Fwd: hotswap and CGL (fwd)]

Steven Dake sdake at mvista.com
Thu May 1 10:07:03 PDT 2003

Rusty Lynch wrote:

>On Thu, 2003-05-01 at 06:32, John J Grana wrote:
>>Not to put too fine a point on this, but some of the subtle problems with
>>the present implementation is starting to show itself.
>>Last I read the code, the present CompactPCI hotswap still relied on a PCI
>>bus being present on the device being swapped. In the discussion below, it
>>mentions a daemon that retrieves hotswap events (good) but then talks about
>>a vendor/device id.
>>In a cPCI backplane with the actual PCIbus supported, this is not a
>>problem. Every board has it's PCI bridge that will respond with a
>>vendor/device id.
>>Now, fast forward to what is happening today... PICMG 2.16 Packet Switched
>>Backplane introduced switched ethernet to the backplane, but "optionally"
>>allowed the PCIbus to stay in place. But, the next generation of these
>>class of chassis (CompactTCA) has NO PCIbus at all. Ditto for AdvancedTCA.
>>I had tried to kick start a thread a while back, asking the question
>>"Assuming no PCIbus, what is the best standards method of identifying
>>devices in a backplane?". This includes not only hotswapping but boot time
>>discovery as well. There were 2 ways proposed. IPMI or Ethernet. There were
>>many pros and cons to each method.
IPMI is the best way, IMHO.  With IPMI, a getdeviceid ipmi command can 
be sent to each slot identifying the slot.  Then the FRU data can be 
read to determine the exact unique identifier of the device (such as a 
fibre channel IEEE ID).

>>The present 2.5 cpci_hotplug code has come a long way, looks good. But, in
>>its present form will only support cPCI systems that have a PCIbus.
>>Assuming we (CGL) leave it this way, it is safe to say that both 2.6 Linux
>>and CGL do NOT support hotswap in "some" 2.16 systems and not at all on
>>CompactTCA and AdvancedTCA.
>>John Grana
>>jjg at pt.com
>Exactly what kind of hotswap is not enabled on non-PCI CompactTCA and
>>From what I have seen, these systems either have:
>1. a complete compute nodes plugged into a slot, so as far as the kernel
>running on this blade is concerned, nothing is different then if the
>kernel was running on a normal rack mounted server.  What does hotswap
>mean in this case?
The operating system must accept the hotswap request (button press) by 
lighting the blue led and turning off power to the blade.  This requires 
additions to the kernel and can be solved by modifying the power 
management functionality to access power management with IPMI.  Then the 
poweroff command will call the shutdown() syscall which will call the 
powermanagement powerdown api.

>2. some kind of special purpose device (like a line card) that will
>communicate with other machines via the particular high speed bus in the
>backplane (like Gigabit ethernet).  From what I have seen (like with
>Gigabit on the backplane), there is nothing to do in the kernel (beyond
>what the bus driver is already doing) to support the notion of hotswap.
One example is hotswap of disk blades while they are in use by the 
operating system.  There must be a mechanism to shut down currently used 
devices so that the operating system doesn't choke or try to recover 
errors on a device that has been removed.  This can be achieved by 
forced unmount and forcibly removing access to file descriptors and md 
devices so they are no longer accessed by the user.  This can also be 
done with RAID1 without changes to the kernel but this spawns error 
recovery.  But what if someone has one of the raid member devices open?  
The OS would crash or become unstable if it were removed.  It is also 
useful to shut down RAID cleanly so as to not spawn error recovery in 
the RAID layer that could impact the performance of the storage system.  
Finally there must be a mechanism to remove the fibrechannel device from 
the operating system scsi data structures, and add it back again when it 
is inserted.

>Maybe you could give a specific example of a device that hotswap is not
>supported with an ATCA or CompactTCA?
>    --rustyl

More information about the cgl_discussion mailing list