[Openais] Re: defect 169 fixed up (and 172)

Steven Dake sdake at mvista.com
Wed Oct 27 14:26:39 PDT 2004


Mark

I think your test case is ok unless there was soemthing missing in your
description of what you tried.  If you do an ifdown, you will looose the
configuration (which your log shows). Then later if you do an ifup, the
configuration should recover into a full configuration.  Could you try
out the ifup step and see if that brings back your configuration?  If it
does, then I'd say it works for you..

Thanks
-steve


On Wed, 2004-10-27 at 07:37, Mark Haverkamp wrote:
> On Tue, 2004-10-26 at 23:58 -0700, Steven Dake wrote:
> > Mark,
> > 
> > I have a patch for defect 169 (assert on ifdown).  If the interface is
> > downed during operation, the processor will enter a singleton
> > configuration and continue to operate.  If the interface is then uped
> > later, the processor will attempt to join any other configurations it
> > can locate on the multicast address.  In the process I found a pretty
> > nasty bug (defect 172) which causes two singleton configurations not to
> > be able to form a configuration because the local variables that are
> > normally reset by the evs algorithm are not changed in the singleton
> > configuration mode.
> > 
> > If you could give it a spin and let me know how it works for you, that'd
> > be cool.
> 
> OK, I tried it.  The other nodes recovered OK.  The node that I did the
> ifdown on didn't though.  I had all the nodes sending and receiving
> events at the time.
> 
> Here is where I did the ifdown:
> 
> Oct 27  7:27:39 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
> log_syslog: lost message: Resource temporarily unavailable
> Oct 27  7:27:39 [NOTICE  ] [GMI  ] entering GATHER state.
> Oct 27  7:27:39 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> log_syslog: lost message: Resource temporarily unavailable
> Oct 27  7:27:39 [NOTICE  ] [GMI  ] I am the only member.
> log_syslog: lost message: Resource temporarily unavailable
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] New Configuration:
> Oct 27  7:27:39 [NOTICE  ] [CLM  ]      192.168.1.18
> log_syslog: lost message: Resource temporarily unavailable
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] Members Left:
> Oct 27  7:27:39 [NOTICE  ] [CLM  ]      192.168.1.8
> Oct 27  7:27:39 [NOTICE  ] [CLM  ]      192.168.1.17
> log_syslog: lost message: Resource temporarily unavailable
> Oct 27  7:27:39 [NOTICE  ] [CLM  ]      192.168.1.19
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] Members Joined:
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] New Configuration:
> Oct 27  7:27:39 [NOTICE  ] [CLM  ]      192.168.1.18
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] Members Left:
> Oct 27  7:27:39 [NOTICE  ] [CLM  ] Members Joined:
> 
> That's all. I interrupted the program and did a stack dump.  It was in
> poll_run.
> 
> Program received signal SIGINT, Interrupt.
> 0x420d224b in poll () from /lib/i686/libc.so.6
> (gdb) bt
> #0  0x420d224b in poll () from /lib/i686/libc.so.6
> #1  0x08057fc0 in poll_run (handle=0) at aispoll.c:371
> #2  0x0804a736 in main (argc=1, argv=0xbffffa24) at main.c:989
> #3  0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6
> (gdb) c
> Continuing.
> 
> 
> (the "log_syslog: lost message: Resource temporarily unavailable"
> messages are some debug I added to see why I was dropping messages from
> syslog).
> 
> 
> > 
> > There may be some kind of bug with this because ckptbench freezes
> > (meaning it lost some messages) sometimes during operation.
> > 
> > Thanks
> > -steve
> > 




More information about the Openais mailing list