[Openais] Re: defect 169 fixed up (and 172)

Steven Dake sdake at mvista.com
Fri Oct 29 13:20:59 PDT 2004


n Fri, 2004-10-29 at 12:53, Mark Haverkamp wrote:
> On Fri, 2004-10-29 at 11:26 -0700, Steven Dake wrote:
> > On Fri, 2004-10-29 at 10:15, Mark Haverkamp wrote:
> > > On Fri, 2004-10-29 at 10:05 -0700, Mark Haverkamp wrote:
> > > 
> > > > 
> > > > I'm guessing that the mcast isn't happening from the send side.  I'll
> > > > add a results check to each of the sendmsg calls in gmi.c and see where
> > > > things are going wrong.
> > > > 
> > > > Mark.
> > > > 
> > > 
> > > OK, here is the results of printing out res:
> > > 
> > > 
> > > 
> > > 
> > > Oct 29 10:08:13 [WARNING ] [GMI  ] Token being retransmitted.
> > > sendmsg failed errno == 22
> > > Oct 29 10:08:13 [WARNING ] [GMI  ] The network interface is down.
> > > Oct 29 10:08:13 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
> > > Oct 29 10:08:13 [NOTICE  ] [GMI  ] entering GATHER state.
> > > Oct 29 10:08:13 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> > > memb_state_gather_enter: res = -1 errno = 22
> > > mjsend: res = -1, errno = 22
> > > Oct 29 10:08:14 [NOTICE  ] [GMI  ] I am the only member.
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.8
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.17
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.19
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.8 down
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.17 down
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.19 down
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] No channels to send
> > > otmcast: res = -1, errno = 22
> > > 
> > > 
> > > 
> > > 
> > > Oct 29 10:08:26 [WARNING ] [GMI  ] The network interface is now up.
> > > Oct 29 10:08:26 [NOTICE  ] [GMI  ] entering GATHER state.
> > > Oct 29 10:08:26 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> > > memb_state_gather_enter: res = 44 errno = 22
> > > Oct 29 10:08:26 [NOTICE  ] [GMI  ] I am the only member.
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:26 [NOTICE  ] [EVT  ] Already in config change, Starting over, m 1, c 0
> > > Oct 29 10:08:26 [NOTICE  ] [EVT  ] No channels to send
> > > otmcast: res = 364, errno = 11
> > > 
> > > 
> > > It looks like something successfully was sent.  But we're not receiving
> > > it. I'm not sure how the multicasting works, but does the application
> > > need to register for receiving mcasts?  If so, could we have lost the
> > > registration when the interface went down?
> > > 
> > 
> > The multicast does a variety of things which could cause it to fail if
> > the interface goes down.  This is a behavior change from 2.4, which
> > doesn't seem to have any negative effects on interface down then up. 
> > One thing to note in my testing I used ifconfig eth1 down wait 5 seconds
> > ifconfig eth1 up not ifdown and ifup.  Would you try ifconfig to see if
> > it does anything differently?
> > 
> 
> OK, I did.  Something peculiar happened.  It got token loss, but never
> noticed that the interface went away.
> 
> Oct 29 12:47:19 [WARNING ] [GMI  ] Token being retransmitted.
> Oct 29 12:47:20 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
> Oct 29 12:47:20 [NOTICE  ] [GMI  ] entering GATHER state.
> Oct 29 12:47:20 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> memb_state_gather_enter: res = 44 errno = 11
> Oct 29 12:47:20 [NOTICE  ] [GMI  ] I am the only member.
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] New Configuration:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.18
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Left:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.8
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.17
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.19
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Joined:
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.8 down
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.17 down
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.19 down
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] New Configuration:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.18
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Left:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Joined:
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] No channels to send
> 
> 
> ifconfig eth1 up is not noticed at all.
> 
> Mark.

Ok I'll have to try 2.6 and see if I can debug the issue..

Thanks for the report.






More information about the Openais mailing list