[cgl_discussion] Some Initial Comments on DDH-Spec-0.5h.pdf
Gustafson, Geoffrey R
geoffrey.r.gustafson at intel.com
Mon Sep 23 18:25:57 PDT 2002
You're basically right that a "hardened driver" just means a "good driver".
And Linux certainly has plenty of good drivers already. However, everyone
has a different idea of 'good'. This spec is an attempt to define what
'good' means for a driver, in the context of high availability carrier
Most of what makes a 'good' driver is common for all purposes - things you
mention like don't make the system hang, don't cause fatal exceptions. But
there are some things that would be different between a desktop, embedded
system, enterprise server, or carrier server. For instance, when there is a
tradeoff between reliability and performance; when reliability is king, it
might be wise to do an insane amount of parameter checking to offset the
merest chance of an undetected bug crashing a system.
Regarding the question: why not just fix the "bad" drivers? Drivers that are
actually bad are probably for obscure hardware that is not really of
concern. The purpose is to take good drivers and make sure they meet the
last few percent of the objective standard of 'good'.
You bring up a very good point about the possibility of killing N birds with
one stone with hardening in the kernel itself. I don't know enough to
address that, and can only suppose that maybe device-independent validation
is very limited?
Another good point was about enforcement. Just because something is hardened
at one point, doesn't necessarily mean some of the rules won't get
accidentally violated by patches later on. So it either requires periodic
reevaluation or strong buy-in from the respective maintainers. At least part
of the beauty of open source is it _can_ be evaluated by an objective third
party, if someone chose to do that.
You asked several times for objective data, and I agree that would be great.
However, drivers _are_ in the unique position of being both privileged code
and yet specific to certain hardware. Thus they are capable of more damage
than user-space code, but (on average) can't have been tested in as many
configurations as core kernel code. So at least without data, they are a
very logical starting point.
My opinions are my own and not necessarily those of Intel Corporation.
From: Andy Pfiffer [mailto:andyp at osdl.org]
Sent: Monday, September 23, 2002 4:35 PM
To: cgl_discussion at osdl.org
Cc: rob.rhoads at intel.com; hardeneddrivers-discuss at lists.sourceforge.net
Subject: [cgl_discussion] Some Initial Comments on DDH-Spec-0.5h.pdf
[ these are my initial comments on the draft release spec -- Andy ]
I'm not sure I fully understand the problem/feature that is
attempting to be addressed by this specification. I have a hunch,
but I don't see it expressed clearly and unambiguously
at the beginning of the document.
The specification implies that the major problem with "regular"
drivers is that they:
1. are not written with good programming practices.
2. do not report errors.
3. do not fail gracefully when hardware errors are
Is that correct? It would be most helpful if there were
real-world examples (or statistics) cited to indicate that
non-hardended drivers were the obstacle for carrier-grade
use of Linux, or references to existing drivers that could be
used as examples of items 1), 2), and 3). We all have differing
experiences with bad drivers, bad hardware, bad fans, bad power
supplies, and so on; it would be relevant to see any historical
data that confirms or refutes the specification's assumptions.
Generic question: why not just fix the "bad" drivers?
Generic question: why not focus the "hardening effort" on the
edges of the kernel interfaces, rather than on a driver-by-driver
basis? Specifically: why not put the "professional paranoia"
into all of the kernel code that calls into drivers, and all
of the routines commonly called by drivers? One could move
from a model of "this driver is hardened" to "all drivers
are suspect until proven otherwise." Wouldn't that address
90% of the perceived problem up front, rather than spending 100%
effort to "harden" one driver at a time?
General comment: the specification, as written, does not address
an way to enforce compliance, largely because the Stability
and Reliability section is based upon the list of Good Coding
Re: What is a Hardened Driver?
I'd recommend moving this closer to the beginning of the
specification. My hunch is that "driver hardening" is
really about just these four items.
"A typical device driver design focuses on the normal,
proper operation of the hardware; attention to driver
behavior in the event of hardware faults is often minimal."
A broad generalization that isn't backed by an example.
I could also state with the same basis in fact: "attention
to correct driver behavior in a multiprocessor environment
is often minimal", or "attention to correct handling of
critical sections is often minimal."
Re: Driver Hardening Categories
"Stability and Reliability"
I consider a "good driver" to have the following attributes:
1. does not cause, directly or indirectly, fatal exceptions.
2. does not cause, directly or indirectly, the system to hang.
3. satisfies the relevant functions as specified,
with "good performance" characteristics
4. detects errors in configuration, operation, or other aspects
of the hardware (or software) functions that are managed by
5. is expressed in a maintainable form.
If I map that to the the categories listed in this section,
what I see is that a "hardened driver" has all of the attributes
of a "good driver" plus the following (verbatim):
Stability and reliability:
- "provide for fault injection testing"
N/A: any of these functions should be considered part of
the driver's requirements; if it doesn't meet the
requirements it's not a "good driver."
N/A: also part of the driver's specifications
My opinion after reading this section that a "hardened driver"
is equivalent to a "good driver", and that "hardened with
is equivalent to a "good driver with standard diagnostics", and
"hardened with instrumentation" is a "good driver with standard
The one item I couldn't bin: "fault injection testing."
cgl_discussion mailing list
cgl_discussion at lists.osdl.org
More information about the cgl_discussion