[cgl_discussion] Re: [lkcd-general] OSDL - CGL - Serviceability - Public Draft

Richard J Moore richardj_moore at uk.ibm.com
Mon May 10 09:09:16 PDT 2004


This is a very good start for any enterprise deployment of Linux.  Thinking
ahead to advanced problem determination exercises, I think you need to add
some specification for:

1) one-time tracing - with sutable specification for means of
instrumentation. better to use a mix of dynamic and static rather than rely
solely on static where objections will be raised if the level of
granularity becomes too great. (I distinguish static from dynamic as
follows: static requires some form of code modification to define the
tracepoint. There will be some overhead from inactive tracepoints. Dynamic
doesn't require any source modification, the instrumentation is applied to
the binary, when it is needed. In the examples I have worked with, the
instrumentation is applied to the in-code image of the code).

2) flight-recorder tracing - this is where a minimal level of tracing is
usually present. It necessarily requires minimal overhead and therefore
granularity is limited. But it makes all the difference to a crash dump
when one needs to get a sense of whether things were moving or not.
Usually flight-recorder traces are maintained at a component level -
because the tracing reqyuirements will be component specific. It's also
characterised by writing to a wrapping buffer, which is only extracted and
formatted when a problem occurs. Mosty often this is extracted from a crash

3) Trapping. Most CPU architectures provide some form of breakpoint and
debugging h/w. This is of course most often exploited by debuggers - but
those are of little relevance to the production environment. There is
another use of debugging h/w, which is relevant, and is present in a number
of high-end, high-performance operating systems. This is the ability to
implant a condition trap, which when run-time conditions are satisfied will
take an action such as force a crash dump.

I stress that these capabilities apply to the problems that occur once a
platform is pretty stable and one only see the sort of obscure problem,
which thought very costly to experience, is very difficult to force a
re-creation. Applies to mature operating systems, and will apply to Linux
(sooner rather than later - we hope).


- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
Tel: (+44) (0)1962-817072

             Joel Krauska                                                  
             <jkrauska at cisco                                               
             .com>                                                      To 
             Sent by:               lkcd-general at lists.sourceforge.net     
             lkcd-general-ad                                            cc 
             min at lists.sourc                                               
             eforge.net                                                bcc 
             05/05/2004             [lkcd-general] OSDL - CGL -            
             21:29                  Serviceability - Public Draft          


I wanted to let this mailing list know about the availability of an
early public draft of the Carrier Grade Linux v3.0 Serviceability
specification, available at


The requirements in this document are aimed at supporting serviceability
of a network element in a carrier-grade environment.

I acknowledge that the requirements in this draft are being implemented
in a variety of ways and many of the requirements in this document exist
in current implementations.  I am contacting this mailing list because I
believe your projects and expertise may address some of the requirements
and we'd like to solicit feedback.

Again, this is an early draft document of the v3.0 serviceability
requirements spec.  Past OSDL Carrier Grade Linux technical documents
have contained all requirements in a single document.  For OSDL CGL
v3.0 draft releases, we are releasing them as more granular sections,
roughly split on functional boundaries.  These boundaries are
Standards, Availability, Clustering, Hardware,
Performance, Security, and Serviceability (this document).

More information on Carrier Grade Linux and the Carrier Grade Linux
Working Group can be found at


Feel free to direct any comments on the spec to me directly at
jkrauska at cisco.com or to our mailing list at cgl_discussion at osdl.org.


This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to
deliver higher performing products faster, at low TCO.
Lkcd-general mailing list
Lkcd-general at lists.sourceforge.net

More information about the cgl_discussion mailing list