[cgl_discussion] Re: [Hardeneddrivers-discuss] comments re: Device Driver Hardening Design Spec

Daniel E. F. Stekloff dsteklof at us.ibm.com
Tue Sep 24 10:03:04 PDT 2002


Please see my comments below:


On Monday 23 September 2002 03:26 pm, Randy.Dunlap wrote:
> Comments on
> "Device Driver Hardening Design Specification,"
> Draft Release 0.5h


<snip>


> Section 3.3.3: Diagnostics Interface for Hardened Drivers
> "test names and parameters must be known by the application
> invoking them."  How realistic is this?
> Won't it cause lots of updates to the userspace Diag manager?
>
> What I'd prefer to see is a way to invoke a fixed set of known
> tests to any driver.  Traditionally this could have been via an
> ioctl, but most developers seem to prefer in-ram filesystems
> nowadays, so a write to some file (e.g., "echo 4 > rundiag")
> would cause the driver to run diag. number 4.
> An example of a fixed set of known tests could be something like:
> 	0	basic self-test
> 	1	config register/interface test
> 	2	bus interface test
> 	3	device register/memory interface test
> 	4	traffic test
> 	5	typical test
> 	6	full/exhaustive test
> (or use a callback/entry point for each one, like Greg suggested)


I agree with you completely. For diagnostics, we are planning to work with 
driverfs in 2.5 and later kernels. We wish to make a standardized diagnostic 
interface like the one you've outlined as well as defined returns to add some 
structure. We are currently looking at adding device class interfaces 
specific to device classes. Device classes can be loaded on the fly and can 
be crafted for particular devices.  

The hardened driver diagnostics piece was intended to mirror driverfs, we were 
intending to use driverfs in later kernels. The mechanism for running a test 
- "run" blah blah blah - evolved from requirements the people at Intel had 
for running diagnostics in device drivers. Intel had originally planned to do 
an ioctl interface. The ioctl interface would have been more flexible than a 
filesystem interface. I believe the goal was to answer the flexibility as 
best we could using the filesystem. This goal was flawed and unnecessary. We 
could, as you have stated, define a simple interface to run diagnostics in 
device drivers. 

As a note - > I wish to change the current implementation of the driver 
hardening spec that we did with our Common Resource Management System:

1) I want to simplify the interface for running diagnostics - one value/one 
file - and mirror what we can do with driverfs.

2) I also want to get rid of the clog of logging that's outlined in the 
hardened driver specification. With all the UUIDs that must be passed with 
every message, my log files get full and I have trouble sifting through it 
all for the data I am looking for. I end up putting in printk's to get what I 
want and ignoring the spec's messsaging. <grin>

As a side discussion - should we hammer out what interface would be good for 
diagnostics? Ioctls or file system? I have been pushing the filesystem angle 
because of the benefits driverfs gives us in 2.5. The ability to walk a tree 
of devices and run diagnostics is a very useful feature. I am open, however, 
to discussion. A filesystem isn't really an optimum interface for 
diagnostics.


> Section 3.3.3.2: Data Structures for Device Diagnostics:
> The "run" function seems to open-ended, like a possible security
> threat.  Doesn't it also require string parsing by the kernel?
> Also don't need/use typedef for <result_t>.
>
> Section 3.3.3.3: Driver callback functions for device diags:
> Q: Is a device taken out of normal service manually or
> automatically for diags?  Or does its driver just return IO
> errors during diags?


This is a very good quesiton. There are different levels of diagnostics - 
shared and exclusive. Shared diagnostics can run while a device is operating 
normally and don't impact performance or do anything destructive. Exclusive 
tests are much more thorough and do need complete control over a device to 
test it completely and also make sure it doesn't bop any important data.

We believe we should start small by requiring a priviledged user to remove the 
device from operation prior to running diagnostic tests. We can return a 
message saying the device is still in use. We can even allow a user to hang 
themselves. 

Eventually, we could add software states to devices. There could be an 
"available" state where the device is configured and ready for use, an 
"undefined" state for a device in the system but not configured, and a 
"diagnostic" or "service" state for devices for exclusive an uninterruptable 
control. This could help automate diagnostics and make sure that a device is 
not being used. *BUT*, this would require quite a bit of work in the kernel 
and I'm not entirely sure it's really necessary. For now, we will rely on 
manual removal from service.

Thoughts?





More information about the cgl_discussion mailing list