[Openais] Re: recent segfault

Mark Haverkamp markh at osdl.org
Tue Feb 1 13:03:39 PST 2005


On Tue, 2005-02-01 at 13:48 -0700, Steven Dake wrote:
> I was thinking another possibility is that after a processor joins a
> configuration, it takes the end of previous fragment from another
> processor into its assembly area.  Instead it should start on the next
> fragment start and discard any previous fragmented data from new
> processors.

I think that I see.  What you are saying is that a partial message was
sent before the processor joined and once it joined it received the last
piece.  
> 
> I think what we need is some kind of value in each message (short int)
> which specifies the index in msg_lens[x] where the first fragment starts
> for this packet, or 0xffff if this fragment contains no starting
> fragment.

Maybe, along with the fragmented bit (last message is fragment) add a
continuation bit (first part of buffer is continuation of a previous
message.  The receiving processor would throw away continuations if its
assembly area didn't already have something in it.

> 
> Does this scenario match the configuration change you saw?  I think for
> this kind of crash to happen, you would have to see a crash on the
> joining processor.
> 

Things had been running just fine for about 3 hours, then there was a
token timeout.  In the end, the configuration didn't change.  All four
processors were still in the configuration.  Although, it was the
processor that detected the token timeout in the first place that also
got the segfault. 

 
-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list