[cgl_discussion] Some Initial Comments on DDH-Spec-0.5h.pdf
Andy Pfiffer
andyp at osdl.org
Mon Sep 23 16:35:04 PDT 2002
[ these are my initial comments on the draft release spec -- Andy ]
Re: DDH-Spec-0.5h.pdf
Comment:
I'm not sure I fully understand the problem/feature that is
attempting to be addressed by this specification. I have a hunch,
but I don't see it expressed clearly and unambiguously
at the beginning of the document.
The specification implies that the major problem with "regular"
drivers is that they:
1. are not written with good programming practices.
2. do not report errors.
3. do not fail gracefully when hardware errors are
detected.
Is that correct? It would be most helpful if there were
real-world examples (or statistics) cited to indicate that
non-hardended drivers were the obstacle for carrier-grade
use of Linux, or references to existing drivers that could be
used as examples of items 1), 2), and 3). We all have differing
experiences with bad drivers, bad hardware, bad fans, bad power
supplies, and so on; it would be relevant to see any historical
data that confirms or refutes the specification's assumptions.
Generic question: why not just fix the "bad" drivers?
Generic question: why not focus the "hardening effort" on the
edges of the kernel interfaces, rather than on a driver-by-driver
basis? Specifically: why not put the "professional paranoia"
into all of the kernel code that calls into drivers, and all
of the routines commonly called by drivers? One could move
from a model of "this driver is hardened" to "all drivers
are suspect until proven otherwise." Wouldn't that address
90% of the perceived problem up front, rather than spending 100%
effort to "harden" one driver at a time?
General comment: the specification, as written, does not address
an way to enforce compliance, largely because the Stability
and Reliability section is based upon the list of Good Coding
Practices.
Re: What is a Hardened Driver?
"fault handling"
"fault recovery"
"fault prediction"
"fault analysis"
I'd recommend moving this closer to the beginning of the
specification. My hunch is that "driver hardening" is
really about just these four items.
Quote:
"A typical device driver design focuses on the normal,
proper operation of the hardware; attention to driver
behavior in the event of hardware faults is often minimal."
Comment:
A broad generalization that isn't backed by an example.
I could also state with the same basis in fact: "attention
to correct driver behavior in a multiprocessor environment
is often minimal", or "attention to correct handling of
critical sections is often minimal."
Re: Driver Hardening Categories
"Stability and Reliability"
Comment:
I consider a "good driver" to have the following attributes:
1. does not cause, directly or indirectly, fatal exceptions.
2. does not cause, directly or indirectly, the system to hang.
3. satisfies the relevant functions as specified,
with "good performance" characteristics
4. detects errors in configuration, operation, or other aspects
of the hardware (or software) functions that are managed by
the driver.
5. is expressed in a maintainable form.
If I map that to the the categories listed in this section,
what I see is that a "hardened driver" has all of the attributes
of a "good driver" plus the following (verbatim):
Stability and reliability:
- "provide for fault injection testing"
Instrumentation:
N/A: any of these functions should be considered part of
the driver's requirements; if it doesn't meet the
requirements it's not a "good driver."
High Availability:
N/A: also part of the driver's specifications
My opinion after reading this section that a "hardened driver"
is equivalent to a "good driver", and that "hardened with diagnostics"
is equivalent to a "good driver with standard diagnostics", and
"hardened with instrumentation" is a "good driver with standard
instrumentation."
The one item I couldn't bin: "fault injection testing."
More information about the cgl_discussion
mailing list