[cgl_discussion] Some Initial Comments on DDH-Spec-0.5h.pdf

Andy Pfiffer andyp at osdl.org
Mon Sep 23 16:35:04 PDT 2002

[ these are my initial comments on the draft release spec -- Andy ]

Re: DDH-Spec-0.5h.pdf

	I'm not sure I fully understand the problem/feature that is
	attempting to be addressed by this specification.  I have a hunch,
	but I don't see it expressed clearly and unambiguously
	at the beginning of the document.

	The specification implies that the major problem with "regular"
	drivers is that they:

		1. are not written with good programming practices.
		2. do not report errors.
		3. do not fail gracefully when hardware errors are

	Is that correct?  It would be most helpful if there were
	real-world examples (or statistics) cited to indicate that
	non-hardended drivers were the obstacle for carrier-grade
	use of Linux, or references to existing drivers that could be
	used as examples of items 1), 2), and 3).  We all have differing
	experiences with bad drivers, bad hardware, bad fans, bad power
	supplies, and so on; it would be relevant to see any historical
	data that confirms or refutes the specification's assumptions.

	Generic question: why not just fix the "bad" drivers?

	Generic question: why not focus the "hardening effort" on the
	edges of the kernel interfaces, rather than on a driver-by-driver
	basis?  Specifically: why not put the "professional paranoia"
	into all of the kernel code that calls into drivers, and all
	of the routines commonly called by drivers?  One could move
	from a model of "this driver is hardened" to "all drivers
	are suspect until proven otherwise."  Wouldn't that address
	90% of the perceived problem up front, rather than spending 100%
	effort to "harden" one driver at a time?

	General comment: the specification, as written, does not address
	an way to enforce compliance, largely because the Stability
	and Reliability section is based upon the list of Good Coding

Re: What is a Hardened Driver?

  "fault handling"

  "fault recovery"

  "fault prediction"

  "fault analysis"

	I'd recommend moving this closer to the beginning of the
	specification.  My hunch is that "driver hardening" is
	really about just these four items.

	"A typical device driver design focuses on the normal,
	 proper operation of the hardware; attention to driver
	 behavior in the event of hardware faults is often minimal."

	A broad generalization that isn't backed by an example.

	I could also state with the same basis in fact: "attention
	to correct driver behavior in a multiprocessor environment
	is often minimal", or "attention to correct handling of
	critical sections is often minimal."

Re: Driver Hardening Categories

	"Stability and Reliability"

	I consider a "good driver" to have the following attributes:
	1. does not cause, directly or indirectly, fatal exceptions.
	2. does not cause, directly or indirectly, the system to hang.
	3. satisfies the relevant functions as specified,
	   with "good performance" characteristics
	4. detects errors in configuration, operation, or other aspects
	   of the hardware (or software) functions that are managed by
	   the driver.
	5. is expressed in a maintainable form.

	If I map that to the the categories listed in this section,
	what I see is that a "hardened driver" has all of the attributes
	of a "good driver" plus the following (verbatim):

		Stability and reliability:
		- "provide for fault injection testing"

		N/A: any of these functions should be considered part of
		     the driver's requirements;  if it doesn't meet the
		     requirements it's not a "good driver."

		High Availability:
		N/A: also part of the driver's specifications

	My opinion after reading this section that a "hardened driver"
	is equivalent to a "good driver", and that "hardened with diagnostics"
	is equivalent to a "good driver with standard diagnostics", and
	"hardened with instrumentation" is a "good driver with standard

	The one item I couldn't bin: "fault injection testing."

More information about the cgl_discussion mailing list