[cgl_discussion] comments re: Device Driver Hardening Design Spec

Randy.Dunlap rddunlap at osdl.org
Mon Sep 23 15:26:23 PDT 2002


Comments on
"Device Driver Hardening Design Specification,"
Draft Release 0.5h


(Sure, some of these are similar to comments from others.)

Sections 1 and 2.5 and 5:
Regarding paranoia and checking input parameters and data from
hardware:  please try to find some clear way to limit how much of
this is done.  It could bloat a driver severely and degrade the
entire system.
I see some parts of this as almost unlimited, which is not
good.

Section 1.2 and Table 1:
about making code clear and easy to read for accurate maintenance:
These are good, but most of these apply to any good Linux
device driver, not just hardened ones.  This should be in a
separate document ("Linux Device Drivers: Coding, Development,
and Kernel APIs" e.g.) and then this [DDH] document could refer
to that one.
This would also enable a more-focused spec for DDH, which would
be a good thing.

Section 1.2: Instrumentation:
Need to be able to do stats and diags without POSIX event logging.

Section 2: Stability and Reliability
Section 2.2: Software Coding Methodologies
Section 2.3: Separate Device Instances
Most of these bullets apply to any good Linux device driver.
I'd prefer to see these in a separate document.

How often does a driver need to check for OS and hardware
anomalies?  Should a driver check for memory corruption also?

Section 2.5.1: Resource Allocation:
kernel APIs like kmalloc() should return aligned memory.
Does this spec say that these should be double-checked?
or that the kernel API docs need more info in them?

Section 2.6: Proper Interrupt Sharing:
Section 2.7: Time Bounded Device Events:
Section 2.9: Time Bounded Status Wait Loops
Section 2.11: Spurious Events
Section 2.12: Panic Behavior (some people might disagree :)
Section 2.13: Enumerated Return Codes
Section 2.15.*: Standard Kernel Functions for Hardware Access
These apply to all good Linux device drivers, so they should be in
the separate Linux Device Drivers spec., not the DDH spec.

Section 3: on POSIX Event Logging:
what makes it persistent?
can syslog be persistent?
Can stats and diags be independent of (done without) PEL?

Section 3.2.4.2:  Gauge statistic type
Gauges (usually) should be done in userspace.

Section 3.3.3: Diagnostics Interface for Hardened Drivers
"test names and parameters must be known by the application
invoking them."  How realistic is this?
Won't it cause lots of updates to the userspace Diag manager?

What I'd prefer to see is a way to invoke a fixed set of known
tests to any driver.  Traditionally this could have been via an
ioctl, but most developers seem to prefer in-ram filesystems
nowadays, so a write to some file (e.g., "echo 4 > rundiag")
would cause the driver to run diag. number 4.
An example of a fixed set of known tests could be something like:
	0	basic self-test
	1	config register/interface test
	2	bus interface test
	3	device register/memory interface test
	4	traffic test
	5	typical test
	6	full/exhaustive test
(or use a callback/entry point for each one, like Greg suggested)

Section 3.3.3.2: Data Structures for Device Diagnostics:
The "run" function seems to open-ended, like a possible security
threat.  Doesn't it also require string parsing by the kernel?
Also don't need/use typedef for <result_t>.

Section 3.3.3.3: Driver callback functions for device diags:
Q: Is a device taken out of normal service manually or
automatically for diags?  Or does its driver just return IO
errors during diags?

Section 3.4.2: C Macro compatibility wrapper:
That portable log/print macro name is ugly.
Does this name come from the EVLOG project?
How about something like printk_default()?

int EVL_FACILITY_CODE = LOG_LOCAL5;
This is only required for CONFIG_EVLOG, isn't it?

Section 3.4.3.2, Table 3: Event source-related attributes:
Section 3.4.3.4, Table 6: Commonly shared attributes:
Section 3.4.3.6.2, Table 7: Resource Management/Status Attributes:
Section 3.4.3.7.2, Table 8: Diagnostic Attributes:
Section 3.4.3.8.3, Table 9: Statistics Attributes:
All _ID attributes except for Resource_ID are for UUIDs.
How about calling it Resource_Name instead, to be consistent?

Section 3.4.3.7.2, Table 8: Diagnostic Attributes:
Is there a fixed format for <datetime>, like ISO standard notation?

Section 3.4.3.8.3, Table 9: Statistics Attributes:
Can Diagnostic_Test_Name value be <empty_string>?


I do agree with other comments that the stats and diag parts
of the DDH spec look promising.  We need to make sure that it
is useful without adding improper kernel APIs (e.g., don't
use "void *" parameter types) and without adding code to
kernel space if it can be done reasonably in user space.


-- 
~Randy




More information about the cgl_discussion mailing list