[cgl_discussion] RE: [cgl_tech_board] Use case - TCP session takeover

Cress, Andrew R andrew.r.cress at intel.com
Fri Mar 18 05:32:51 PST 2005


I understand the difference between deliberate takeover and unexpected
failover, but shouldn't there be some overlap & synergy between the two?
Since the deliberate one is easier, it makes sense to do that first.
However, if this is to be adopted into the kernel, there would need to
be reuse of key components for the unexpected failover (like session
state functions maybe).   Perhaps that explanation isn't part of the use
case, but I'd like to see something defining what can be reused, or a
pointer to other docs that describe it.  


-----Original Message-----
From: Takashi Ikebe [mailto:ikebe.takashi at lab.ntt.co.jp] 
Sent: Thursday, March 17, 2005 9:52 PM
To: cgl_tech_board at groups.osdl.org; cgl_specs at groups.osdl.org;
cgl_discussion at lists.osdl.org
Subject: [cgl_tech_board] Use case - TCP session takeover

The following is a use case for a TCP session takeover.  This
addresses CAF.2.3 and CAF.2.4 TCP session takeover on CGL Specification
Please feel free to comment / suggestion.

OSDL CGL specifies a mechanism to synchronize TCP sockets, buffer
structures, and sequence numbers so that redundant nodes may take over
TCP sessions originated on other nodes. A deliberate TCP session
takeover assumes that TCP session(s) are transferred deliberately and
not as the result of unexpected node failure(s).
In addition to that, when a critical resource fails, such as a CPU,
memory, or kernel, a redundant node may take over TCP sessions
originated on the failed node. Note that when the TCP session(s) are
assumed by a redundant node, the sessions will resume from the last
checkpoint. TCP traffic should continue even if there is a conflict
between the last TCP state of the failed node and the checkpointed TCP
state on the redundant node.

Desired Outcome
Mainline kernel acceptance or distro acceptance.

Application administrators and developers use the requirement, and as a
result, end users also use the function as a part of service.

Basically, the requirement provide new system call and user-land library
based APIs.
So the requirement will be setup-ed on kernel installation and also be
used on application development phase.

Implementation Notes
These Implementation Notes apply to all scenarios.  These are just
guidelines, not cast in concrete.
The requirement needs to provide APIs which enables below function;
1.Stop the indicated session.(usually indicated by file descriptor)
2.Get information of the indicated session.
The information includes session status(such as address/port/sequence
number), and receive/send data which is in socket buffer. the data in
the socket buffer are not re-transmittable, because receive data in
socket buffer are already finished TCP transmission (ACK sequence).
3.Set the information to the indicated session.
4.Restart the indicated session.

The switch over sequence of HA(ACT-SBY) cluster assumption is below;
1.The service process on the active node use the requirement and stop
and get the session information which wants to keep.
2.The service process synchronize the session information with the
service process which on the standby node.
3.Change the cluster state(ACT to STOP).
4.The service process on the standby node set the session information.
5.Change the cluster state(SBY to ACT).

TCP connection passing:http://tcpcp.sourceforge.net/
"TCP Connection Passing", Ottawa Linux Symposium, July 2004

Takashi Ikebe
NTT Network Service Systems Laboratories
9-11, Midori-Cho 3-Chome Musashino-Shi,
Tokyo 180-8585 Japan
Tel : +81 422 59 4246, Fax : +81 422 60 4012
e-mail : ikebe.takashi at lab.ntt.co.jp

More information about the cgl_discussion mailing list