[cgl_discussion] Re: device enumeration

Patrick Mochel mochel at osdl.org
Fri Feb 7 12:27:38 PST 2003

[ Note to non-kernel developers, this discussion is getting highly 
technical, and does not conform to usual cgl_discussion fare.. :) ]

> >Who cares about performance, this is _not_ a preformance criticial
> >situtaion.
> >  
> >
> Performance is critical; just ask any vendor.  It may not be critical in 
> PCI hotswap and it certainly isn't critical in USB hotswap, but in 
> next-gen architectures, the difference between speed would allow an 
> operator to remove a physical device before the OS is ready to have it 
> removed (surprise removal).  This works fine in pci (reads 0xff), but 
> other architectures don't like it so much.  I am working on surprise 
> removal support for advanced tca, but its much more complicated then an 
> expected extraction and until I can say for sure it will work, 
> performance does matter.

What you're talking about here, and later in your email, has no relevance 
to /sbin/hotplug. The act of suddenly removing a device from the system 
may or may not generate an interrupt. In the case that a device does not 
generate an interrupt, an operator is required to notify the OS that the 
device is going away, like how PCMCIA works today. It is up to the bus 
driver to sever data structure links to the physical device as quickly as 
possible, so further requests gracefully fail in software (as opposed to 
e.g. reading 0xff from PCI space). 

/sbin/hotplug happens after the device is removed, and is meant for 
userspace housekeeping, like removing mountpoints, device nodes, etc. This 
is after the critical work of severing access to the device is done. 
This can happen at any later point, because access is, well, severed. No 
userspace requests can get to it. 

How fast that cleanup happens is dependent on the performance of 
/sbin/hotplug. That is entirely true. But, as evidenced above, it's out of 
the critical path of removing internal representations of the device, 
obviating immediate need to optimize it. 

> >Sorry, but that's where you differ from the other kernel developers.
> >/sbin/hotplug is conceptually much cleaner and nicer than having to 
> >do select() or ioctls.  Remember ioctls are basically depricated, and
> >you should not add new ones.
> >  
> >
> I don't necessarily agree that _all_ kernel developers believe ioctl's 
> should be deprecated.  Just look at all of the rich ioctls in the kernel 
> currently.  The major problem without using ioctls (by using a 
> filesystem for accessing methods in the kernel) is that there is no way 
> to retrieve a return code.  Without a return code, how is the 
> application supposed to know what the kernel did was successful, but by 
> polling its state again?  Then the application may understand the 
> operation was faulted, but the exact failure reason is still up in the 
> air.  I suppose if the community makes the decision that living without 
> return codes is acceptable, I could live with it.

That's crap. Last I checked, both read(2) and write(2) returned an 
integer, -1 on an error, with errno set appropriately. Please check the 
documentation on your system to verify you have a working version. 

Concerning ioctl(2): don't use it for new interfaces. Please. For 
justification, please see LKML archives for Linus's opinions about it's 
badness, and the issues with ioctl(2) and porting to 64-bit architectures. 
ASCII-based filesystem interfaces are the direction that we want to head. 
This was discussed heavily in Ottawa last year, and in various threads on 
LKML. Even though you may disagree with it, many mainstream kernel 
developers are investigating ways to move away from it, and do not welcome 
new interfaces dependent exploting ioctl(2).

> Here is how it works.  A telco has a fault in their system.  They figure 
> out what the exact failure is (bad switch, bad hub, bad disk, bad cpu 
> blade, bad whatever), and they dispatch a 10$/hour worker to fix it. 
>  The worker presses the hotswap request button, and while the OS is 
> busily executing /sbin/hotplug, the worker thinks its ok to remove the 
> device (when the OS still is using it).  This may "work" in Linux for 
> PCI, but it certainly isn't correct that a device driver should expect 
> 0xff to be returned on pci operations (in the case of PCI).  Other 
> architectures don't return anything indicating any failure causing real 
> confusion.
> Performance is _so_ critical here, because if the removal operation is 
> fast enough, there is no phsyical way to remove the device from the 
> slot/bus/whatever before the OS has removed the device from the 
> operating system data structures.

That is a fault of the bus code that receives either the interrupt that 
the device is going away, or a notification from userspace. It has nothing 
to do with the reciprocal notification the kernel gives userspace to clean 
up devices nodes, etc.

> Yes I agree sysfs/taking advantage of the driver model is a superior 
> choice to mvista's chassis manager, but hey, we had to work with what we 
> had available at the time.  If sysfs were in 2.4, we would have used 
> that instead.  In future revs, we may backport sysfs to provide this 
> sort of functionality and ensure that HDI works for both 2.4 and 2.5 
> easily without a bunch of changes to parsing driver model information.

I understand that you worked with what you had, to an extent. sysfs, and
previous incarnations of it, have been around since Aug 2001. Greg has
been using it since Sep 2001, when it was solely a 2.4 patch. It was
minimally advertised, but several developers took the initiative to find
and use it at that time. Note also that sysfs is a completely separate
feature from the driver model. 

> The key difference between MontaVista's HDI and whatever anyone else is 
> working on is the excellent mechanism by which insert and remove events 
> are transmitted (via the event broker) without the need to execute any 
> type of hotplug scripts.  Perhaps both mechanisms could be used in 
> MontaVista's implementation and let selection take its course.  This 
> would allow us to keep the userspace database/api/tools/etc without the 
> need to reimplement what everyone already agrees is the correct solution.

We are at am impasse. You have a working solution, we don't. It's included 
in a shipping product, and now has a legacy of users and developers on the 
project. You've chosen to implement a new interface for communicating 
device events to userspace, bypassing the de facto standard for doing so, 
for whatever reasons. 

Greg adamantly disapproves of the code, as do I, for what little I've seen
of it. I will not bless the current form, or a 2.5/2.6 version without a 
serious architectural change. I will not speak for Greg, but I suspect he 
may feel the same way. 

We are working on code with similar intent, which uses the existing
infrastructure, and has been at least partially designed by some core
kernel developers. I'm not here to say "Nyah nyah nyah.  Our code will get
into the kernel and yours won't." However, our goals and design decisions 
are more closely aligned with other kernel developers, and given the 
nature of the community, these types of projects have a better chance of 
becoming mainstream code. 

I would like us to leverage each other's work, because I'm sure we can 
help each other out to some extent. But, given the amount of time we each 
have going down our separate paths, that may not be immediately possible. 


More information about the cgl_discussion mailing list