[cgl_discussion] Re: [dcl_discussion] ANNOUNCE: OSDL Clusters
La Monte H.P. Yarroll
piggy at timesys.com
Wed Dec 3 10:52:25 PST 2003
Steven Dake wrote:
>While I would agree with Lars that userland is the best place to put the
>clustering infrastructure (and this is indeed the method MontaVista is
>taking when implementing the SA Forum APIs), I'd like to see what the
>team at OSDL comes up with, even if it is in the kernel. As a
>community, we can only learn from whatever results from that effort...
>The one place the kernel can really help, and should not be dismissed
>lightly, is reliable totally ordered messaging. This is a clear
>requirement of any clustering infrastructure (including cluster
>membership) and is best implemented by interrupt-driven timer sources.
>Without totally ordered messaging, properly implementing distributed
>application failover for 100% of failure cases is (*not impossible, but
>close*). Totally ordered reliable messaging that doesn't violate
>causility can be implemented in user space, but then poll must be used
>to simulate timers, which really doesn't work that well.
You can cover the vast majority of clustering applications without a
multicast totally ordered transport if you have an ordered unicast
transport which can return (probably) undelivered messages back to the
sender. There is such a transport in the 2.6 kernel now--SCTP.
As bonuses this transport already includes direct support for
multihoming, implements congestion control (which helps with scaling),
and permits partial ordering when needed (to avoid head-of-line blocking).
This protocol is already in the standards track at the IETF, and has
been shipping in several products for a couple years.
Multicast congestion control remains a tough research problem.
More information about the cgl_discussion