[Openais] Re: recent segfault
Daniel McNeil
daniel at osdl.org
Tue Feb 1 16:28:24 PST 2005
On Tue, 2005-02-01 at 13:51, Steven Dake wrote:
> On Tue, 2005-02-01 at 14:27, Mark Haverkamp wrote:
> > On Tue, 2005-02-01 at 14:19 -0700, Steven Dake wrote:
> > > On Tue, 2005-02-01 at 14:03, Mark Haverkamp wrote:
> > > > On Tue, 2005-02-01 at 13:48 -0700, Steven Dake wrote:
> > > > > I was thinking another possibility is that after a processor joins a
> > > > > configuration, it takes the end of previous fragment from another
> > > > > processor into its assembly area. Instead it should start on the next
> > > > > fragment start and discard any previous fragmented data from new
> > > > > processors.
> > > >
> > > > I think that I see. What you are saying is that a partial message was
> > > > sent before the processor joined and once it joined it received the last
> > > > piece.
> > > > >
> > > > > I think what we need is some kind of value in each message (short int)
> > > > > which specifies the index in msg_lens[x] where the first fragment starts
> > > > > for this packet, or 0xffff if this fragment contains no starting
> > > > > fragment.
> > > >
> > > > Maybe, along with the fragmented bit (last message is fragment) add a
> > > > continuation bit (first part of buffer is continuation of a previous
> > > > message. The receiving processor would throw away continuations if its
> > > > assembly area didn't already have something in it.
> > > >
> > > This is good. I want to be sure we can handle large MTUs for messages.
> > > This means we need about a range of 0-3000 to specify the start index (2
> > > bytes, plus 1 byte per message with MTU of 9000). I'll start working on
> > > a patch integrating the fragment bit and continuation bit into the start
> > > index to compact some space.
> >
> > I'm not following the need for extra bytes. Wouldn't we only need a
> > single bit in the mcast structure like the fragmented bit? The only
> > message in the incoming buffer that can be a continuation is the first
> > one. If the assembly index is zero and the continuation bit is set on
> > the incoming message, we just throw away the first message in the
> > incoming buffer and the next one (if any) is the start of a new one.
> >
>
> good idea Mark. The patch should be pretty easy to develop. I'm
> looking at the sort queue in use bug now. If you want to work up a
> patch for the continuation bit idea that would be cool.
>
> It looks like if a message is lost in recovery,
> memb_state_operational_enter may sometimes be called in certain
> conditions after about 1-2 hours of running with RANDOM_DROP enabled.
> This would definately result in a crash because there would be missing
> messages in the message stream which a) doesn't follow vs sematics b)
> would break the assembler.
>
Steve,
The handling of the packed and fragment handling makes me think of a
potential problem:
If a config change happens in the middle of a large message that has
been fragmented, I'm wondering if the ordering of messages might
be messed up:
Starting with a 2 node cluster (A and B)
A sends out A1
B sends out B1frag1
C joins cluster and sends out C1
A sends out A2
B sends out B1frag2 and B1frag3
I think the above describes what you and Mark are talking about
where C can see the B1frag2 and B1frag3 and not know how to process
it. Am I understanding this right?
Now the problem: what is the actual message deliver order:
A sees A1,C1,A2,B1
B sees A1,C1,A2,B1
C sees C1,A2 (with mark's fix to drop partial fragments).
So I see 2 problems with this:
1. B1 was started in the old config (A,B) but delivered in the new
config (A,B,C)
2. C does not see B1 at all, since he only received partial fragments.
Am I mis-understanding the way it works? If B does not deliver the
entire message B1, before C joins, then we can get the above problems.
Does the protocol give the surviving nodes a change to send out their
last message in its entirety before allowing a new node to join?
Thanks,
Daniel
More information about the Openais
mailing list