[cgl_discussion] Re: [Hardeneddrivers-discuss] comments re: Device Driver Hardening Design Spec
Daniel E. F. Stekloff
dsteklof at us.ibm.com
Tue Sep 24 10:03:04 PDT 2002
Please see my comments below:
On Monday 23 September 2002 03:26 pm, Randy.Dunlap wrote:
> Comments on
> "Device Driver Hardening Design Specification,"
> Draft Release 0.5h
> Section 3.3.3: Diagnostics Interface for Hardened Drivers
> "test names and parameters must be known by the application
> invoking them." How realistic is this?
> Won't it cause lots of updates to the userspace Diag manager?
> What I'd prefer to see is a way to invoke a fixed set of known
> tests to any driver. Traditionally this could have been via an
> ioctl, but most developers seem to prefer in-ram filesystems
> nowadays, so a write to some file (e.g., "echo 4 > rundiag")
> would cause the driver to run diag. number 4.
> An example of a fixed set of known tests could be something like:
> 0 basic self-test
> 1 config register/interface test
> 2 bus interface test
> 3 device register/memory interface test
> 4 traffic test
> 5 typical test
> 6 full/exhaustive test
> (or use a callback/entry point for each one, like Greg suggested)
I agree with you completely. For diagnostics, we are planning to work with
driverfs in 2.5 and later kernels. We wish to make a standardized diagnostic
interface like the one you've outlined as well as defined returns to add some
structure. We are currently looking at adding device class interfaces
specific to device classes. Device classes can be loaded on the fly and can
be crafted for particular devices.
The hardened driver diagnostics piece was intended to mirror driverfs, we were
intending to use driverfs in later kernels. The mechanism for running a test
- "run" blah blah blah - evolved from requirements the people at Intel had
for running diagnostics in device drivers. Intel had originally planned to do
an ioctl interface. The ioctl interface would have been more flexible than a
filesystem interface. I believe the goal was to answer the flexibility as
best we could using the filesystem. This goal was flawed and unnecessary. We
could, as you have stated, define a simple interface to run diagnostics in
As a note - > I wish to change the current implementation of the driver
hardening spec that we did with our Common Resource Management System:
1) I want to simplify the interface for running diagnostics - one value/one
file - and mirror what we can do with driverfs.
2) I also want to get rid of the clog of logging that's outlined in the
hardened driver specification. With all the UUIDs that must be passed with
every message, my log files get full and I have trouble sifting through it
all for the data I am looking for. I end up putting in printk's to get what I
want and ignoring the spec's messsaging. <grin>
As a side discussion - should we hammer out what interface would be good for
diagnostics? Ioctls or file system? I have been pushing the filesystem angle
because of the benefits driverfs gives us in 2.5. The ability to walk a tree
of devices and run diagnostics is a very useful feature. I am open, however,
to discussion. A filesystem isn't really an optimum interface for
> Section 126.96.36.199: Data Structures for Device Diagnostics:
> The "run" function seems to open-ended, like a possible security
> threat. Doesn't it also require string parsing by the kernel?
> Also don't need/use typedef for <result_t>.
> Section 188.8.131.52: Driver callback functions for device diags:
> Q: Is a device taken out of normal service manually or
> automatically for diags? Or does its driver just return IO
> errors during diags?
This is a very good quesiton. There are different levels of diagnostics -
shared and exclusive. Shared diagnostics can run while a device is operating
normally and don't impact performance or do anything destructive. Exclusive
tests are much more thorough and do need complete control over a device to
test it completely and also make sure it doesn't bop any important data.
We believe we should start small by requiring a priviledged user to remove the
device from operation prior to running diagnostic tests. We can return a
message saying the device is still in use. We can even allow a user to hang
Eventually, we could add software states to devices. There could be an
"available" state where the device is configured and ready for use, an
"undefined" state for a device in the system but not configured, and a
"diagnostic" or "service" state for devices for exclusive an uninterruptable
control. This could help automate diagnostics and make sure that a device is
not being used. *BUT*, this would require quite a bit of work in the kernel
and I'm not entirely sure it's really necessary. For now, we will rely on
manual removal from service.
More information about the cgl_discussion