[cgl_discussion] Project Review: Boot Cycle Detection
Gustafson, Geoffrey R
geoffrey.r.gustafson at intel.com
Wed Sep 11 12:00:58 PDT 2002
Project Review for the Hardened Drivers Project
1. Quote the requirements from the requirements doc
that your project is expected to meet.
Requirement: 2.3 Boot Cycle Detection
Version Assignment: Configurable 1.1
· The goal of OSDL CGL is to utilize a standard API such that
applications can be guaranteed to work across distributions. As no such
standard yet exists we don't yet consider this to be a core function.
Application Type: G, S, M
Description: OSDL CGL shall detect a frequent reboot cycle due to
recurring failures and will go offline if this occurs.
2. Explain how you think the project meets the
We clarified in the requirements mailing lists that we intend this to be
solution for disk-based machines, with detection occurring after the
comes up. For network boot configurations, this feature can be provided
middleware on the boot server. A boot cycle detection feature provided at
lower level, e.g. in the boot loader, would be i) more complex, ii) more
platform-specific as it requires some type of nonvolatile storage. As no
member had an implementation of such a thing, or a burning desire to
one, this was infeasible for v1.0.
This implementation allows you to configure the maximum number of
reboots considered acceptable, and the minimum uptime the system must
experience before resetting the count of reboots. If the maximum reboots
exceeded, the system shuts down.
Thus the vague terms "frequent" and "recurring failures" are defined by
end-user through configuration, as appropriate for the particular
set. The minimum uptime essentially defines "frequent", and the maximum
reboots defines the number of "recurring failures".
3. Explain the design of the project or point to
a document on the web that explains the design.
bootcycle is a simple implementation via bash shell scripts.
The feature is configured through the file /etc/bootcycle.conf.
It records a boot counter in /var/lib/misc/bootcycle.status
It logs messages to /var/log/bootcycle.log (as it runs before either
or the POSIX event log are available, this is the best option).
A man page bootcycle(8) is provided with more information.
The bootcycle utility should be executed after each boot before the init
scripts. If enabled, it checks the previous value of a boot counter. If
the counter exceeds the configured maximum number of consecutive reboots,
bootcycle resets the counter and shuts down the system. Otherwise, it
increments the counter and sleeps until the configured minimum uptime
has elapsed, then resets the counter to zero. Thus the counter only
remains incremented if the system reboots within that time.
This program is primarily useful in a complex network environment with
servers being managed. It is easier to identify that a machine is in need
service if it fails entirely. If it is stuck in a reboot cycle,
may assume that it is coming back online. When there is load sharing
other servers, it is better for the failed machine to shutdown rather
begin handling traffic only to drop it repeatedly.
The bootcycle.conf file allows you to enable/disable the feature, set the
maximum reboots, and set the minimum uptime (in minutes).
4. Pointer to the code/patch.
More information about the cgl_discussion