[Hardeneddrivers-discuss] RE: [cgl_discussion] Some Initial
Comments on DDH-Spec-0.5h.pdf
mochel at osdl.org
Tue Sep 24 10:02:29 PDT 2002
Note: Could you please adjust your Outlook settings to wrap lines at 80
columns? Some of us will never use anything but text-based editors with
80-column limits. ;)
> > General comment: the specification, as written, does not address
> > an way to enforce compliance, largely because the Stability
> > and Reliability section is based upon the list of Good Coding
> > Practices.
> [DPH] Interesting point, Solaris has it's specification and a test harness,
> as I recall to qualify drivers. They cover a lot of ground in both
> to make sure that the advocated interfaces are documented, used, and
> used right. For example, there were several ways to do dma but the
> Solaris spec advocates a best API to use and the checker (I think)
> was able to identify if an unadvocated API is used. The driver
> Best Known Methods also is part of this, although I'm not sure how a
> harness could test if they are used.
> Are we considering any kind of test harness, kinda like we discussed
> in TLT 1.0 around fault injection testing for this? Possibly a
> A open source project to provide this for validating drivers might
> a part of this effort, would need to be extensible for specific
> categories/environments. Is there anything like this now for Linux
> in an
> open source project?
There is nothing that I know of like this, but I think you just stumbled
upon a gold mine. "Driver hardening" is an ephemeral goal. The mechanism
that you(*) have proposed in the spec is heavy-handed and obviously not
very well liked in the community so far.
What you really want is a test-harness, both for runtime and compile-time
to validate that drivers are doing the right things at the right time.
First, of course, you need to identify what those right things are.
[ Some students at Stanford did something called the Stanford Checker, an
extension to gcc that they could teach to check for many things: large
on-stack variables, unfreed memory, etc etc. It's not open source, but
some people at Berkeley did something similar that was.. That's a big
project in itself, but would be great for compile-time checking..]
IMO, this would be a very valuable tool for many people, and get you some
brownie points in the community. If you made it easy enough to use and
understand, you might even be able to coerce other developers into doing
some of the inevitable fixing of drivers..
Also, I'm fond of the concept of having some way to validate results,
i.e. test, before you actually impelement a piece of software. It doesn't
have to be perfect, or even quality code. But, I would expect you guys to
already have a test harness specifically for validating the hardness of
> << Stuff Deleted >>
> > Comment:
> > I consider a "good driver" to have the following attributes:
> > 1. does not cause, directly or indirectly, fatal exceptions.
> > 2. does not cause, directly or indirectly, the system to hang.
> > 3. satisfies the relevant functions as specified,
> > with "good performance" characteristics
> > 4. detects errors in configuration, operation, or other aspects
> > of the hardware (or software) functions that are managed by
> > the driver.
> > 5. is expressed in a maintainable form.
> [DPH] - Encapsulates errors, i.e. can shut down the subsystem
> cleanly on
> fatal errors without harming the system, i.e. EIO returns
> attempted IOs if a fatal error was detected and the
> subsystem is
> taken offline, but the system keeps running.
> - Restartable from shut down state, either by unloading and
> the driver or by some form of reset, to take it from the
> EIO state
> back to active. Operator or fault manager initiated.
> This would replace panic() for safe cases where errors are
> clearly not
> affecting things beyond driver state (hardware error, some
> Neither one of these is easy and there is little of this in
> in use today. It would take a infrastructure to do this
> cleanly that
> would need to be part of the specification.
No. This is way too much policy in drivers, let alone the kernel itself.
There are a million different choices you could make, and you don't want
to pollute the kernel trying to express all of them in code.
Please identify all the possible errors and actions that could take place,
along with their severity and the limits of response time. In many cases,
I would think/hope that they are not system-critical errors, and userspace
could still have a chance to execute (otherwise you're hosed anyway).
If so, if an error occurs that requires driver-level action, create
infrastructure to notify userspace (via /sbin/hotplug) and create a
userspace agent to parse the error and execute admin-specified policy.
(*) - 'you' meaning the Intel team that produced the Driver Hardening
Spec. I assume that all y'all are part of the same team, but I don't
More information about the cgl_discussion