[cgl_discussion] Re: [dcl_discussion] ANNOUNCE: OSDL Clusters (foundational components)

La Monte H.P. Yarroll piggy at timesys.com
Wed Dec 3 10:52:25 PST 2003

Steven Dake wrote:

>While I would agree with Lars that userland is the best place to put the
>clustering infrastructure (and this is indeed the method MontaVista is
>taking when implementing the SA Forum APIs), I'd like to see what the
>team at OSDL comes up with, even if it is in the kernel.  As a
>community, we can only learn from whatever results from that effort...
>The one place the kernel can really help, and should not be dismissed
>lightly, is reliable totally ordered messaging.  This is a clear
>requirement of any clustering infrastructure (including cluster
>membership) and is best implemented by interrupt-driven timer sources. 
>Without totally ordered messaging, properly implementing distributed
>application failover for 100% of failure cases is (*not impossible, but
>close*).  Totally ordered reliable messaging that doesn't violate
>causility can be implemented in user space, but then poll must be used
>to simulate timers, which really doesn't work that well.
You can cover the vast majority of clustering applications without a 
multicast totally ordered transport if you have an ordered unicast 
transport which can return (probably)  undelivered messages back to the 
sender.  There is such a transport in the 2.6 kernel now--SCTP.

As bonuses this transport already includes direct support for 
multihoming, implements congestion control (which helps with scaling), 
and permits partial ordering when needed (to avoid head-of-line blocking).

This protocol is already in the standards track at the IETF, and has 
been shipping in several products for a couple years.

Multicast congestion control remains a tough research problem.

More information about the cgl_discussion mailing list