[cgl_discussion] RE: [Fwd: hotswap and CGL (fwd)]
sdake at mvista.com
Thu May 1 10:07:03 PDT 2003
Rusty Lynch wrote:
>On Thu, 2003-05-01 at 06:32, John J Grana wrote:
>>Not to put too fine a point on this, but some of the subtle problems with
>>the present implementation is starting to show itself.
>>Last I read the code, the present CompactPCI hotswap still relied on a PCI
>>bus being present on the device being swapped. In the discussion below, it
>>mentions a daemon that retrieves hotswap events (good) but then talks about
>>a vendor/device id.
>>In a cPCI backplane with the actual PCIbus supported, this is not a
>>problem. Every board has it's PCI bridge that will respond with a
>>Now, fast forward to what is happening today... PICMG 2.16 Packet Switched
>>Backplane introduced switched ethernet to the backplane, but "optionally"
>>allowed the PCIbus to stay in place. But, the next generation of these
>>class of chassis (CompactTCA) has NO PCIbus at all. Ditto for AdvancedTCA.
>>I had tried to kick start a thread a while back, asking the question
>>"Assuming no PCIbus, what is the best standards method of identifying
>>devices in a backplane?". This includes not only hotswapping but boot time
>>discovery as well. There were 2 ways proposed. IPMI or Ethernet. There were
>>many pros and cons to each method.
IPMI is the best way, IMHO. With IPMI, a getdeviceid ipmi command can
be sent to each slot identifying the slot. Then the FRU data can be
read to determine the exact unique identifier of the device (such as a
fibre channel IEEE ID).
>>The present 2.5 cpci_hotplug code has come a long way, looks good. But, in
>>its present form will only support cPCI systems that have a PCIbus.
>>Assuming we (CGL) leave it this way, it is safe to say that both 2.6 Linux
>>and CGL do NOT support hotswap in "some" 2.16 systems and not at all on
>>CompactTCA and AdvancedTCA.
>>jjg at pt.com
>Exactly what kind of hotswap is not enabled on non-PCI CompactTCA and
>>From what I have seen, these systems either have:
>1. a complete compute nodes plugged into a slot, so as far as the kernel
>running on this blade is concerned, nothing is different then if the
>kernel was running on a normal rack mounted server. What does hotswap
>mean in this case?
The operating system must accept the hotswap request (button press) by
lighting the blue led and turning off power to the blade. This requires
additions to the kernel and can be solved by modifying the power
management functionality to access power management with IPMI. Then the
poweroff command will call the shutdown() syscall which will call the
powermanagement powerdown api.
>2. some kind of special purpose device (like a line card) that will
>communicate with other machines via the particular high speed bus in the
>backplane (like Gigabit ethernet). From what I have seen (like with
>Gigabit on the backplane), there is nothing to do in the kernel (beyond
>what the bus driver is already doing) to support the notion of hotswap.
One example is hotswap of disk blades while they are in use by the
operating system. There must be a mechanism to shut down currently used
devices so that the operating system doesn't choke or try to recover
errors on a device that has been removed. This can be achieved by
forced unmount and forcibly removing access to file descriptors and md
devices so they are no longer accessed by the user. This can also be
done with RAID1 without changes to the kernel but this spawns error
recovery. But what if someone has one of the raid member devices open?
The OS would crash or become unstable if it were removed. It is also
useful to shut down RAID cleanly so as to not spawn error recovery in
the RAID layer that could impact the performance of the storage system.
Finally there must be a mechanism to remove the fibrechannel device from
the operating system scsi data structures, and add it back again when it
>Maybe you could give a specific example of a device that hotswap is not
>supported with an ATCA or CompactTCA?
More information about the cgl_discussion