[Openais] Re: defect 169 fixed up (and 172)

Steven Dake sdake at mvista.com
Fri Oct 29 17:44:34 PDT 2004


Mark

This code works for me on 2.6.7.  On 2.6.8 nfs crashes in root.  2.6.9
crashes after running for about 5-10 seconds.  I'll give 2.6.10 rcs a
try tomorrow and see if things are fixed up.

In the meantime could you send me your ifup and ifdown program and
various scripts you use to up and down the interfaces?  There must be
something different between your versions and mine (or the network
behavior was changed after 2.6.7).

Thanks
-steve

On Fri, 2004-10-29 at 12:53, Mark Haverkamp wrote:
> On Fri, 2004-10-29 at 11:26 -0700, Steven Dake wrote:
> > On Fri, 2004-10-29 at 10:15, Mark Haverkamp wrote:
> > > On Fri, 2004-10-29 at 10:05 -0700, Mark Haverkamp wrote:
> > > 
> > > > 
> > > > I'm guessing that the mcast isn't happening from the send side.  I'll
> > > > add a results check to each of the sendmsg calls in gmi.c and see where
> > > > things are going wrong.
> > > > 
> > > > Mark.
> > > > 
> > > 
> > > OK, here is the results of printing out res:
> > > 
> > > 
> > > 
> > > 
> > > Oct 29 10:08:13 [WARNING ] [GMI  ] Token being retransmitted.
> > > sendmsg failed errno == 22
> > > Oct 29 10:08:13 [WARNING ] [GMI  ] The network interface is down.
> > > Oct 29 10:08:13 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
> > > Oct 29 10:08:13 [NOTICE  ] [GMI  ] entering GATHER state.
> > > Oct 29 10:08:13 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> > > memb_state_gather_enter: res = -1 errno = 22
> > > mjsend: res = -1, errno = 22
> > > Oct 29 10:08:14 [NOTICE  ] [GMI  ] I am the only member.
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.8
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.17
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.19
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.8 down
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.17 down
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] cluster node at 192.168.1.19 down
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:14 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:14 [NOTICE  ] [EVT  ] No channels to send
> > > otmcast: res = -1, errno = 22
> > > 
> > > 
> > > 
> > > 
> > > Oct 29 10:08:26 [WARNING ] [GMI  ] The network interface is now up.
> > > Oct 29 10:08:26 [NOTICE  ] [GMI  ] entering GATHER state.
> > > Oct 29 10:08:26 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> > > memb_state_gather_enter: res = 44 errno = 22
> > > Oct 29 10:08:26 [NOTICE  ] [GMI  ] I am the only member.
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] New Configuration:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ]      192.168.1.18
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Left:
> > > Oct 29 10:08:26 [NOTICE  ] [CLM  ] Members Joined:
> > > Oct 29 10:08:26 [NOTICE  ] [EVT  ] Already in config change, Starting over, m 1, c 0
> > > Oct 29 10:08:26 [NOTICE  ] [EVT  ] No channels to send
> > > otmcast: res = 364, errno = 11
> > > 
> > > 
> > > It looks like something successfully was sent.  But we're not receiving
> > > it. I'm not sure how the multicasting works, but does the application
> > > need to register for receiving mcasts?  If so, could we have lost the
> > > registration when the interface went down?
> > > 
> > 
> > The multicast does a variety of things which could cause it to fail if
> > the interface goes down.  This is a behavior change from 2.4, which
> > doesn't seem to have any negative effects on interface down then up. 
> > One thing to note in my testing I used ifconfig eth1 down wait 5 seconds
> > ifconfig eth1 up not ifdown and ifup.  Would you try ifconfig to see if
> > it does anything differently?
> > 
> 
> OK, I did.  Something peculiar happened.  It got token loss, but never
> noticed that the interface went away.
> 
> Oct 29 12:47:19 [WARNING ] [GMI  ] Token being retransmitted.
> Oct 29 12:47:20 [WARNING ] [GMI  ] Token loss in OPERATIONAL.
> Oct 29 12:47:20 [NOTICE  ] [GMI  ] entering GATHER state.
> Oct 29 12:47:20 [NOTICE  ] [GMI  ] SENDING attempt join because this node is ring rep.
> memb_state_gather_enter: res = 44 errno = 11
> Oct 29 12:47:20 [NOTICE  ] [GMI  ] I am the only member.
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] New Configuration:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.18
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Left:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.8
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.17
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.19
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Joined:
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.8 down
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.17 down
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] cluster node at 192.168.1.19 down
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] CLM CONFIGURATION CHANGE
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] New Configuration:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ]      192.168.1.18
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Left:
> Oct 29 12:47:20 [NOTICE  ] [CLM  ] Members Joined:
> Oct 29 12:47:20 [NOTICE  ] [EVT  ] No channels to send
> 
> 
> ifconfig eth1 up is not noticed at all.
> 
> Mark.




More information about the Openais mailing list