[Openais] Re: defect 169 fixed up (and 172)

Mark Haverkamp markh at osdl.org
Fri Oct 29 12:53:37 PDT 2004


On Fri, 2004-10-29 at 11:26 -0700, Steven Dake wrote:
> On Fri, 2004-10-29 at 10:15, Mark Haverkamp wrote:
> > On Fri, 2004-10-29 at 10:05 -0700, Mark Haverkamp wrote:
> > 
> > > 
> > > I'm guessing that the mcast isn't happening from the send side.  I'll
> > > add a results check to each of the sendmsg calls in gmi.c and see where
> > > things are going wrong.
> > > 
> > > Mark.
> > > 
> > 
> > OK, here is the results of printing out res:
> > 
> > 
> > 
> > 
> > Oct 29 10:08:13 [WARNING ] [GMI  ] Token being retransmitted.
> > sendmsg failed errno == 22
> > Oct 29 10:08:13 [WARNING ] [GMI  ] The network interface is down.
> > Oct 29 10:08:13 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
> > Oct 29 10:08:13 [NOTICE  ] [GMI  ] entering GATHER state.
> > Oct 29 10:08:13 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> > memb_state_gather_enter: res = -1 errno = 22
> > mjsend: res = -1, errno = 22
> > Oct 29 10:08:14 [NOTICE  ] [GMI  ] I am the only member.
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] New Configuration:
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.18
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Left:
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.8
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.17
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.19
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Joined:
> > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.8 down
> > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.17 down
> > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.19 down
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] New Configuration:
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.18
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Left:
> > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Joined:
> > Oct 29 10:08:14 [NOTICE  ] [EVT  ] No channels to send
> > otmcast: res = -1, errno = 22
> > 
> > 
> > 
> > 
> > Oct 29 10:08:26 [WARNING ] [GMI  ] The network interface is now up.
> > Oct 29 10:08:26 [NOTICE  ] [GMI  ] entering GATHER state.
> > Oct 29 10:08:26 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> > memb_state_gather_enter: res = 44 errno = 22
> > Oct 29 10:08:26 [NOTICE  ] [GMI  ] I am the only member.
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] New Configuration:
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ]      192.168.1.18
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Left:
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Joined:
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] New Configuration:
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ]      192.168.1.18
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Left:
> > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Joined:
> > Oct 29 10:08:26 [NOTICE  ] [EVT  ] Already in config change, Starting over, m 1, c 0
> > Oct 29 10:08:26 [NOTICE  ] [EVT  ] No channels to send
> > otmcast: res = 364, errno = 11
> > 
> > 
> > It looks like something successfully was sent.  But we're not receiving
> > it. I'm not sure how the multicasting works, but does the application
> > need to register for receiving mcasts?  If so, could we have lost the
> > registration when the interface went down?
> > 
> 
> The multicast does a variety of things which could cause it to fail if
> the interface goes down.  This is a behavior change from 2.4, which
> doesn't seem to have any negative effects on interface down then up. 
> One thing to note in my testing I used ifconfig eth1 down wait 5 seconds
> ifconfig eth1 up not ifdown and ifup.  Would you try ifconfig to see if
> it does anything differently?
> 

OK, I did.  Something peculiar happened.  It got token loss, but never
noticed that the interface went away.

Oct 29 12:47:19 [WARNING ] [GMI  ] Token being retransmitted.
Oct 29 12:47:20 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
Oct 29 12:47:20 [NOTICE  ] [GMI  ] entering GATHER state.
Oct 29 12:47:20 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
memb_state_gather_enter: res = 44 errno = 11
Oct 29 12:47:20 [NOTICE  ] [GMI  ] I am the only member.
Oct 29 12:47:20 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
Oct 29 12:47:20 [NOTICE  ] [CLM  ] New Configuration:
Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.18
Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Left:
Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.8
Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.17
Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.19
Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Joined:
Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.8 down
Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.17 down
Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.19 down
Oct 29 12:47:20 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
Oct 29 12:47:20 [NOTICE  ] [CLM  ] New Configuration:
Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.18
Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Left:
Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Joined:
Oct 29 12:47:20 [NOTICE  ] [EVT  ] No channels to send


ifconfig eth1 up is not noticed at all.

Mark.

-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list