[Openais] Re: Configuration change question
Mark Haverkamp
markh at osdl.org
Thu Oct 14 14:37:18 PDT 2004
On Thu, 2004-10-14 at 13:37 -0700, Steven Dake wrote:
> On Thu, 2004-10-14 at 13:26, Daniel McNeil wrote:
> > On Thu, 2004-10-14 at 13:04, Steven Dake wrote:
> > > On Thu, 2004-10-14 at 12:26, Mark Haverkamp wrote:
> > > > Steve,
> > > >
> > > > If I remember correctly, the code to deliver messages from the previous
> > > > configuration that happens in the transitional configuration isn't there
> > > > yet. This may explain what I am seeing during the event service
> > > > recovery. I now track open channels on all nodes and keep track by gmi
> > > > messages for opens and closes. At reconfig time, Each node sends its
> > > > open count for each channel via gmi to update any nodes that may be new.
> > > > What I am seeing is that sometimes the open count that a node receives
> > > > is different than its notion of opens for that node. I think that maybe
> > > > an open or close was partially distributed then the config change
> > > > happened and some nodes didn't get the open/close. Is it possible for
> > >
> > > No this is not possible even with the current code (unless there is a
> > > bug). All messages will be recovered from the old configuration before
> > > any configuration change is delivered. If all messages are not
> > > recovered, you will see a repeating EVS %d %d %d lines as I'm sure you
> > > have seen in the past..
> > >
> > > If a message is sent after a configuration change, it will not be
> > > delivered until the new configuration is formed.
> > >
> > > The idea of VS is that we can ensure that the messages and configuration
> > > changes occur in the same order on every processor that is a member of
> > > the old and new configuration. This probably solves the problem your
> > > having (if it works right..).
> >
> > Steve,
> >
> > Can you clarify what you mean by "probably solves the problem
> > you're having"?
>
> sure.. I mean to say that the code should always ensure that messages
> arrive in the same order.
>
> >
> > Is the current code recovering and delivering all old
> > configuration messages before the regular configuration change
> > function gets called?
> >
>
> it doesn't recover and deliver all old "configuration messages" but it
> does recover and deliver all regular messages... (I think this is what
> you meant).
>
> > What messages are sent in the transitional configuration?
> >
>
> None are sent yet.. This remains unimplemented. If there were a hole
> at the end of the configuration, then a transitional configuration
> should be delivered, then any of those messages after which a hole was
> encounted are delivered. This is to indicate to the services that "hey
> you may be missing an important message relating to your operation, so
> count all further messages as suspect". The service may then ignore
> them, or try to do some recovery in the next configuration..
>
> > In Mark's code he is assuming that all outstanding messages
> > have be delivered from previous configuration, then he
> > sends to all nodes the current 'open count' using messages
> > with recovery priority, then unplugs and continues.
> >
> This seems correct and the way the gmi code works, this should work
> perfectly 100% (unless there is a hole, in which case you would know you
> had that problem because openais would continually print out "EVS state"
> with a bunch of numbers over and over).
>
> > So the current code should be delivering all messages to
> > all nodes in the same order even through configuration
> > changes, right?
> >
>
> You got it. Thats how its supposed to work. I really believe it works
> correctly now, except for the hole case which is related to transitional
> configurations. If you can show it not working, then we have a pretty
> serious bug.
Ok, I think that you are right. I went back and stuck in a bunch more
debug prints and I think that I found the problem in my code that
processes lost nodes. I'll send a patch for the fix soon.
Thanks,
Mark.
--
Mark Haverkamp <markh at osdl.org>
More information about the Openais
mailing list