[cgl_discussion] Re: device enumeration
mochel at osdl.org
Fri Feb 7 12:27:38 PST 2003
[ Note to non-kernel developers, this discussion is getting highly
technical, and does not conform to usual cgl_discussion fare.. :) ]
> >Who cares about performance, this is _not_ a preformance criticial
> Performance is critical; just ask any vendor. It may not be critical in
> PCI hotswap and it certainly isn't critical in USB hotswap, but in
> next-gen architectures, the difference between speed would allow an
> operator to remove a physical device before the OS is ready to have it
> removed (surprise removal). This works fine in pci (reads 0xff), but
> other architectures don't like it so much. I am working on surprise
> removal support for advanced tca, but its much more complicated then an
> expected extraction and until I can say for sure it will work,
> performance does matter.
What you're talking about here, and later in your email, has no relevance
to /sbin/hotplug. The act of suddenly removing a device from the system
may or may not generate an interrupt. In the case that a device does not
generate an interrupt, an operator is required to notify the OS that the
device is going away, like how PCMCIA works today. It is up to the bus
driver to sever data structure links to the physical device as quickly as
possible, so further requests gracefully fail in software (as opposed to
e.g. reading 0xff from PCI space).
/sbin/hotplug happens after the device is removed, and is meant for
userspace housekeeping, like removing mountpoints, device nodes, etc. This
is after the critical work of severing access to the device is done.
This can happen at any later point, because access is, well, severed. No
userspace requests can get to it.
How fast that cleanup happens is dependent on the performance of
/sbin/hotplug. That is entirely true. But, as evidenced above, it's out of
the critical path of removing internal representations of the device,
obviating immediate need to optimize it.
> >Sorry, but that's where you differ from the other kernel developers.
> >/sbin/hotplug is conceptually much cleaner and nicer than having to
> >do select() or ioctls. Remember ioctls are basically depricated, and
> >you should not add new ones.
> I don't necessarily agree that _all_ kernel developers believe ioctl's
> should be deprecated. Just look at all of the rich ioctls in the kernel
> currently. The major problem without using ioctls (by using a
> filesystem for accessing methods in the kernel) is that there is no way
> to retrieve a return code. Without a return code, how is the
> application supposed to know what the kernel did was successful, but by
> polling its state again? Then the application may understand the
> operation was faulted, but the exact failure reason is still up in the
> air. I suppose if the community makes the decision that living without
> return codes is acceptable, I could live with it.
That's crap. Last I checked, both read(2) and write(2) returned an
integer, -1 on an error, with errno set appropriately. Please check the
documentation on your system to verify you have a working version.
Concerning ioctl(2): don't use it for new interfaces. Please. For
justification, please see LKML archives for Linus's opinions about it's
badness, and the issues with ioctl(2) and porting to 64-bit architectures.
ASCII-based filesystem interfaces are the direction that we want to head.
This was discussed heavily in Ottawa last year, and in various threads on
LKML. Even though you may disagree with it, many mainstream kernel
developers are investigating ways to move away from it, and do not welcome
new interfaces dependent exploting ioctl(2).
> Here is how it works. A telco has a fault in their system. They figure
> out what the exact failure is (bad switch, bad hub, bad disk, bad cpu
> blade, bad whatever), and they dispatch a 10$/hour worker to fix it.
> The worker presses the hotswap request button, and while the OS is
> busily executing /sbin/hotplug, the worker thinks its ok to remove the
> device (when the OS still is using it). This may "work" in Linux for
> PCI, but it certainly isn't correct that a device driver should expect
> 0xff to be returned on pci operations (in the case of PCI). Other
> architectures don't return anything indicating any failure causing real
> Performance is _so_ critical here, because if the removal operation is
> fast enough, there is no phsyical way to remove the device from the
> slot/bus/whatever before the OS has removed the device from the
> operating system data structures.
That is a fault of the bus code that receives either the interrupt that
the device is going away, or a notification from userspace. It has nothing
to do with the reciprocal notification the kernel gives userspace to clean
up devices nodes, etc.
> Yes I agree sysfs/taking advantage of the driver model is a superior
> choice to mvista's chassis manager, but hey, we had to work with what we
> had available at the time. If sysfs were in 2.4, we would have used
> that instead. In future revs, we may backport sysfs to provide this
> sort of functionality and ensure that HDI works for both 2.4 and 2.5
> easily without a bunch of changes to parsing driver model information.
I understand that you worked with what you had, to an extent. sysfs, and
previous incarnations of it, have been around since Aug 2001. Greg has
been using it since Sep 2001, when it was solely a 2.4 patch. It was
minimally advertised, but several developers took the initiative to find
and use it at that time. Note also that sysfs is a completely separate
feature from the driver model.
> The key difference between MontaVista's HDI and whatever anyone else is
> working on is the excellent mechanism by which insert and remove events
> are transmitted (via the event broker) without the need to execute any
> type of hotplug scripts. Perhaps both mechanisms could be used in
> MontaVista's implementation and let selection take its course. This
> would allow us to keep the userspace database/api/tools/etc without the
> need to reimplement what everyone already agrees is the correct solution.
We are at am impasse. You have a working solution, we don't. It's included
in a shipping product, and now has a legacy of users and developers on the
project. You've chosen to implement a new interface for communicating
device events to userspace, bypassing the de facto standard for doing so,
for whatever reasons.
Greg adamantly disapproves of the code, as do I, for what little I've seen
of it. I will not bless the current form, or a 2.5/2.6 version without a
serious architectural change. I will not speak for Greg, but I suspect he
may feel the same way.
We are working on code with similar intent, which uses the existing
infrastructure, and has been at least partially designed by some core
kernel developers. I'm not here to say "Nyah nyah nyah. Our code will get
into the kernel and yours won't." However, our goals and design decisions
are more closely aligned with other kernel developers, and given the
nature of the community, these types of projects have a better chance of
becoming mainstream code.
I would like us to leverage each other's work, because I'm sure we can
help each other out to some extent. But, given the amount of time we each
have going down our separate paths, that may not be immediately possible.
More information about the cgl_discussion