[Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Cluster summit materials

Steven Dake sdake at mvista.com
Thu Aug 12 16:08:08 PDT 2004


comments below

On Thu, 2004-08-12 at 15:47, Daniel Phillips wrote:
> On Wednesday 11 August 2004 17:24, Daniel McNeil wrote:
> > > IMHO, for the time being only failure detection and failover really
> > > has to be unified, and that is CMAN, including interaction with
> > > other bits and pieces, i.e., Magma and fencing, and hopefully other
> > > systems like Lars' SCRAT.  As far as CMAN goes, Lars and Alan seem
> > > to be the main parties outside Red Hat.  Lon and Patrick are most
> > > active inside Red Hat.  I think we'd advance fastest if they start
> > > hacking each other's code (anybody I just overlooked, please
> > > bellow).
> >
> > I not sure what you mean by "failure detection and failover".
> > Do you mean node failure detection and consensus membership change?
> I mean anything in the cluster that can fail and be reinstantiated.  
> This would include server processes for cluster block devices such as 
> the ones I've designed, as well as whole nodes.  It would also include 
> communication paths, such as socket connections.  But by now you may 
> have detected a bias against trying to deal with the latter in a 
> one-size-fits-all automagic, never-stop-never-give-up cluster 
> communications thingamajig layer.  What we really need is just a 
> framework for failure detection, including methods supplied by various 
> cluster components, and methods for re-instantiating failed components.

There really is no reason to reinvent the wheel here.  An API has
already been developed in the SA Forum Availability Management
Framework, and an implementation already exists
(http://developer.osdl.org/dev/openais).  I suspect there is some work
that linux-ha has done on this topic as well.

> Note note note: while a "cluster component" could conceivably be a whole 
> node, that's a special case and we really need to cater to the case 
> that will eventually be much more common, where cluster nodes may be 
> doing all kinds of other things besides just participating in clusters.  
> So by "cluster component" I really mean something closer to "task".
> > I thought Magma is just redhat's backward compatibility layer.
> > What "interaction" are you worried about?
> You might want to ask Lon about that...
> > How fencing integrates and when it occurs might be issues we
> > will need to think about more.
> Understatement of the day.
> > How can the DLM go to Andrew without a membership layer to
> > provide membership?
> By having a simple registration api that allows one to register a 
> membership layer, in place of what is there now, i.e., function links 
> between modules.

I think what you are missing is that membership and messaging are
strongly related to one another.  When a message is sent, it is sent
under a certain membership view.  When it is received, it should also be
received under that same membership view.  Otherwise, the view of the
membership cannot be used to make decisions along with the message
contents.  If the distributed system must make decisions about a message
based upon the view of the membership (which obviously DLM must do to be
reliable) then integrating these two features is the only approach that
works.  For this reason, membersihp and messaging are tightly
integrated, atleast if a reliable distributed system is desired.

> > > > So you can call the core service "membership", but what we really
> > > > need is membership/communication, which is what cman provides. 
> > > > Do you have another suggestion for this?  TIPC + membership?
> > >
> > > I think you really mean "connection manager", not "communication
> > > service"  I'll step back from this now and watch you guys sort it
> > > out :-)
> >
> > I think John really does mean communication.  For high availability,
> > the cluster should have no single point of failure.  This usually
> > means multiple ethernet links.
> But it's not the business of the cluster framework to operate the links, 
> only to know when they have failed and to be able to arrange for new 
> connections.  So John really does mean "connection" and not 
> "communication", I hope.
> > (I assume CMAN supports multiple 
> > links).  To determine membership there needs to be a way of sending
> > messages between the nodes to determine membership.  Ideally, losing
> > one ethernet link could/would be handle without causing any
> > membership change.
> "Ideally" is not a strong enough word, imho.
> > This kind of intra-cluster communication would be valuable for
> > other cluster components as well.  Example: a cluster snapshot :)
> > or cluster mirror device should be able to send messages to
> > other nodes in the cluster without having to worry about which
> > specific link to use and what to do if a link fails.  This would
> > also be valuable for the DLM.
> OK, we've seen lots of warnings about not getting derailed by trying to 
> invent the perfect cluster communication system, we should heed those 
> warnings.  Instead, let's get down to precise specification of the 
> methods we need to have, and compare it to what already exists, for 
> establishing and re-establishing connections.

The perfect cluster communication model has already been invented:  its
called virtual synchrony and backed up by 20 years of research.  There
are several protocols that implement this model.  If there is no need
for agreed ordering or group communication in dlm, then maybe an
argument could be made that virtual synchrony is not appropriate for
dlm.  But, DLM benefits strongly from the semantics of virtual synchrony
and makes implementing a distributed lock service trivial.

Thanks for listening

> > Does CMAN provide this kind of functionality?  If so, then it
> > really is a communication service.
> http://people.redhat.com/~teigland/sca.pdf
> Regards,
> Daniel
> _______________________________________________
> cgl_discussion mailing list
> cgl_discussion at lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/cgl_discussion

More information about the cgl_discussion mailing list