[cgl_discussion] Some Initial Comments on DDH-Spec-0.5h.pdf

Mon Sep 23 18:25:57 PDT 2002

You're basically right that a "hardened driver" just means a "good driver".
And Linux certainly has plenty of good drivers already. However, everyone
has a different idea of 'good'. This spec is an attempt to define what
'good' means for a driver, in the context of high availability carrier
platforms.

Most of what makes a 'good' driver is common for all purposes - things you
mention like don't make the system hang, don't cause fatal exceptions. But
there are some things that would be different between a desktop, embedded
system, enterprise server, or carrier server. For instance, when there is a
tradeoff between reliability and performance; when reliability is king, it
might be wise to do an insane amount of parameter checking to offset the
merest chance of an undetected bug crashing a system.

Regarding the question: why not just fix the "bad" drivers? Drivers that are
actually bad are probably for obscure hardware that is not really of
concern. The purpose is to take good drivers and make sure they meet the
last few percent of the objective standard of 'good'.

You bring up a very good point about the possibility of killing N birds with
one stone with hardening in the kernel itself. I don't know enough to
address that, and can only suppose that maybe device-independent validation
is very limited?

Another good point was about enforcement. Just because something is hardened
at one point, doesn't necessarily mean some of the rules won't get
accidentally violated by patches later on. So it either requires periodic
reevaluation or strong buy-in from the respective maintainers. At least part
of the beauty of open source is it _can_ be evaluated by an objective third
party, if someone chose to do that.

You asked several times for objective data, and I agree that would be great.
However, drivers _are_ in the unique position of being both privileged code
and yet specific to certain hardware. Thus they are capable of more damage
than user-space code, but (on average) can't have been tested in as many
configurations as core kernel code. So at least without data, they are a
very logical starting point.

Geoff Gustafson

My opinions are my own and not necessarily those of Intel Corporation.

-----Original Message-----
From: Andy Pfiffer [mailto:andyp at osdl.org]
Sent: Monday, September 23, 2002 4:35 PM
To: cgl_discussion at osdl.org
Cc: rob.rhoads at intel.com; hardeneddrivers-discuss at lists.sourceforge.net
Subject: [cgl_discussion] Some Initial Comments on DDH-Spec-0.5h.pdf

[ these are my initial comments on the draft release spec -- Andy ]

Re: DDH-Spec-0.5h.pdf

  Comment:
	I'm not sure I fully understand the problem/feature that is
	attempting to be addressed by this specification.  I have a hunch,
	but I don't see it expressed clearly and unambiguously
	at the beginning of the document.

	The specification implies that the major problem with "regular"
	drivers is that they:

		1. are not written with good programming practices.
		2. do not report errors.
		3. do not fail gracefully when hardware errors are
		   detected.

	Is that correct?  It would be most helpful if there were
	real-world examples (or statistics) cited to indicate that
	non-hardended drivers were the obstacle for carrier-grade
	use of Linux, or references to existing drivers that could be
	used as examples of items 1), 2), and 3).  We all have differing
	experiences with bad drivers, bad hardware, bad fans, bad power
	supplies, and so on; it would be relevant to see any historical
	data that confirms or refutes the specification's assumptions.

	Generic question: why not just fix the "bad" drivers?

	Generic question: why not focus the "hardening effort" on the
	edges of the kernel interfaces, rather than on a driver-by-driver
	basis?  Specifically: why not put the "professional paranoia"
	into all of the kernel code that calls into drivers, and all
	of the routines commonly called by drivers?  One could move
	from a model of "this driver is hardened" to "all drivers
	are suspect until proven otherwise."  Wouldn't that address
	90% of the perceived problem up front, rather than spending 100%
	effort to "harden" one driver at a time?

	General comment: the specification, as written, does not address
	an way to enforce compliance, largely because the Stability
	and Reliability section is based upon the list of Good Coding
	Practices.

Re: What is a Hardened Driver?

  "fault handling"

  "fault recovery"

  "fault prediction"

  "fault analysis"

	I'd recommend moving this closer to the beginning of the
	specification.  My hunch is that "driver hardening" is
	really about just these four items.

  Quote:
	"A typical device driver design focuses on the normal,
	 proper operation of the hardware; attention to driver
	 behavior in the event of hardware faults is often minimal."

  Comment:
	A broad generalization that isn't backed by an example.

	I could also state with the same basis in fact: "attention
	to correct driver behavior in a multiprocessor environment
	is often minimal", or "attention to correct handling of
	critical sections is often minimal."

Re: Driver Hardening Categories

	"Stability and Reliability"

	Comment:
	I consider a "good driver" to have the following attributes:
	1. does not cause, directly or indirectly, fatal exceptions.
	2. does not cause, directly or indirectly, the system to hang.
	3. satisfies the relevant functions as specified,
	   with "good performance" characteristics
	4. detects errors in configuration, operation, or other aspects
	   of the hardware (or software) functions that are managed by
	   the driver.
	5. is expressed in a maintainable form.

	If I map that to the the categories listed in this section,
	what I see is that a "hardened driver" has all of the attributes
	of a "good driver" plus the following (verbatim):

		Stability and reliability:
		- "provide for fault injection testing"

		Instrumentation:
		N/A: any of these functions should be considered part of
		     the driver's requirements;  if it doesn't meet the
		     requirements it's not a "good driver."

		High Availability:
		N/A: also part of the driver's specifications

	My opinion after reading this section that a "hardened driver"
	is equivalent to a "good driver", and that "hardened with
diagnostics"
	is equivalent to a "good driver with standard diagnostics", and
	"hardened with instrumentation" is a "good driver with standard
	instrumentation."

	The one item I couldn't bin: "fault injection testing."

_______________________________________________
cgl_discussion mailing list
cgl_discussion at lists.osdl.org
http://lists.osdl.org/mailman/listinfo/cgl_discussion