[cgl_discussion] (no subject)

Eric.Chacron at alcatel.fr Eric.Chacron at alcatel.fr
Thu Feb 5 09:18:40 PST 2004

Hi Jorg,

I think it's theorically possible and the only solution to reach sw. fault
tolerance ,
but the sw. development cost would be twice as you need to developp
2 or 3 times by different parties the same software black-boxes.

I think  this approach has already been used in some projects by the NASA
with 5 voting computers.

An hot-standby simple approach is different: when a sw. fault exists and
produce a failure on one system
we assume that the same event will not occur until a long period of time
and we recover the failure
by switching over the application on another node. The application restart
from a stable state ( checkpoint )
that is normally made in a healthy state and this gives a chance to not
reproduce the same error case.

In general on-site sw.  failures are rare and caused by concurrent events
that canot be
easily tested.


Jörg Hartmann <jhartmann at aquilacoop.de>@lists.osdl.org on 02/05/2004
11:24:26 AM

Sent by:    cgl_discussion-bounces at lists.osdl.org

To:    <cgl_discussion at osdl.org>
Subject:    [cgl_discussion] (no subject)

Hi there,

you're defining carrier grade and data center Linuxes, and supposedly this
includes fail-over clusters. But at least I could not find anything that
goes beyond hardware- and data (path)-redundancy, althought software
problems are the most common source of trouble (and so software redundancy
is the most important solution).

How about two (or three) API-compatible systems based on various
source codes which could be integrated into a really software-redundant
failover cluster? This would be a real benifit.

Joerg Hartmann,
AQUILA Co-op, Potsdam/Germany

P.S.: Of course, complete software redundancy would also mean two different
kernels. But since there's (Free)BSD that shouldn't be a problem.

cgl_discussion mailing list
cgl_discussion at lists.osdl.org

More information about the cgl_discussion mailing list