[cgl_discussion] POC 6/5 Meeting Minutes
mika at osdl.org
Thu Jun 5 14:26:02 PDT 2003
On Thu, 2003-06-05 at 10:02, Steven Dake wrote:
> 2.7 wishlist ARs
Here is my first iteration of my AR's. Let me know if something is
> AR Mika to provide kexec text.
From Eric Biederman's README file:
"kexec is a set of systems call that allows you to load another
kernel from the currently executing Linux kernel."
In the kernel fault situation loading having the new replacement
kernel in memory and booted by kexec a significant savings on node
downtime is achieved. For more information, see Andy's OLS paper:
kexec was in -mm for a while, but was dropped in 2.5.70, and while
Andy will keep pushing it it is listed as a lowest priority in Andrew
Morton's "must-fix" list, so it is likely that it will not make it into
2.6. Also Eric Biederman seems to engaged with other projects and his
current code does not apply cleanly on 2.5.70, while the "branch" Andy
> AR Mika to bring up network dump to specs group.
For many reasons it is very important to know why a CGL node failed
(as a kind of "black box" feature). In a cluster environment it is
preferably to have that "black box" (i.e. the dump) done to a
centralized place, which means dumping over LAN. For more details see:
LKCD-project seems to be in hibernation, at least latest updates on
SourceForge are from October last year, when there was lot of
discussion about LKCD between Linus and LKCD developers. See for
example this thread:
RedHat has netdump in their distribution, and source is available,
but there is no Open Source project behind it.
> AR Mika to provide text for Application restart text.
> AR Mika to clarify difference between rapid announcement of process
> death and application restart.
Well, it seems that "application restart" has more or less disappeared
from our v2.0 spec. And from kernel point-of-view what is relevant
seems to be the prochadd functionality (the "real" requirement behind
this "rapid announcement of process death"):
in-kernel process monitoring:
There is a need for very quickly (in range of tens of milliseconds)
for an supervising process to notice a death (etc.) of a a process.
Traditionally this would be done by having the process spawn the
application processes, but this requires modification of application
code which in majority of cases is not an option.
Another way would be to "reparent" application processes to this
supervisor, see this thread on LKML:
Third way is to monitor procfs, but this seems to have several
performance issues. Fourth one is to use ptrace ("man 2 ptrace"):
but that probably has some "issues" (not tested).
>AR Mika to bring up application preloading kernel changes to specs
> AR Mika to bring up page flushing to specs subgroup.
These two are also closely related. The preloading is basically a glibc
patch (which while important is not a kernel issue) and a way to lock
(or "pin") _all_ memory pages of the loaded application into the memory
(what's the point of loading it all if the pages get swapped out?).
Page flushing complements that by adding a way for (possibly non-root,
although that may not be realistic) user/application to flush some of
those locked pages out.
User space memory page handling:
In CGL system there is a strong need to avoid run-time latencies
relating to loading code pages from disk (or over network) by loading
them all at the node startup. On the other hand there needs to way
for a supervising process to force such an application to unload some
of those locked pages (for example to allow loading a new, more
While there are some existing mechanisms in the kernel, nothing
currently exists that fully implements the requirement. Reason why
we feel this is 2.7 material is that it should be reasonable simple
to implement and has potentially large usage base.
More information about the cgl_discussion