[Openais] Re: defect 169 fixed up (and 172)
Steven Dake
sdake at mvista.com
Fri Oct 29 13:20:59 PDT 2004
n Fri, 2004-10-29 at 12:53, Mark Haverkamp wrote:
> On Fri, 2004-10-29 at 11:26 -0700, Steven Dake wrote:
> > On Fri, 2004-10-29 at 10:15, Mark Haverkamp wrote:
> > > On Fri, 2004-10-29 at 10:05 -0700, Mark Haverkamp wrote:
> > >
> > > >
> > > > I'm guessing that the mcast isn't happening from the send side. I'll
> > > > add a results check to each of the sendmsg calls in gmi.c and see where
> > > > things are going wrong.
> > > >
> > > > Mark.
> > > >
> > >
> > > OK, here is the results of printing out res:
> > >
> > >
> > >
> > >
> > > Oct 29 10:08:13 [WARNING ] [GMI ] Token being retransmitted.
> > > sendmsg failed errno == 22
> > > Oct 29 10:08:13 [WARNING ] [GMI ] The network interface is down.
> > > Oct 29 10:08:13 [WARNING ] [GMI ] Token loss in OPERATIONAL.
> > > Oct 29 10:08:13 [NOTICE ] [GMI ] entering GATHER state.
> > > Oct 29 10:08:13 [NOTICE ] [GMI ] SENDING attempt join because this node is ring rep.
> > > memb_state_gather_enter: res = -1 errno = 22
> > > mjsend: res = -1, errno = 22
> > > Oct 29 10:08:14 [NOTICE ] [GMI ] I am the only member.
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] New Configuration:
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.18
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Left:
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.8
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.17
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.19
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Joined:
> > > Oct 29 10:08:14 [NOTICE ] [EVT ] cluster node at 192.168.1.8 down
> > > Oct 29 10:08:14 [NOTICE ] [EVT ] cluster node at 192.168.1.17 down
> > > Oct 29 10:08:14 [NOTICE ] [EVT ] cluster node at 192.168.1.19 down
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] New Configuration:
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.18
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Left:
> > > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Joined:
> > > Oct 29 10:08:14 [NOTICE ] [EVT ] No channels to send
> > > otmcast: res = -1, errno = 22
> > >
> > >
> > >
> > >
> > > Oct 29 10:08:26 [WARNING ] [GMI ] The network interface is now up.
> > > Oct 29 10:08:26 [NOTICE ] [GMI ] entering GATHER state.
> > > Oct 29 10:08:26 [NOTICE ] [GMI ] SENDING attempt join because this node is ring rep.
> > > memb_state_gather_enter: res = 44 errno = 22
> > > Oct 29 10:08:26 [NOTICE ] [GMI ] I am the only member.
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] New Configuration:
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] 192.168.1.18
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Left:
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Joined:
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] New Configuration:
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] 192.168.1.18
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Left:
> > > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Joined:
> > > Oct 29 10:08:26 [NOTICE ] [EVT ] Already in config change, Starting over, m 1, c 0
> > > Oct 29 10:08:26 [NOTICE ] [EVT ] No channels to send
> > > otmcast: res = 364, errno = 11
> > >
> > >
> > > It looks like something successfully was sent. But we're not receiving
> > > it. I'm not sure how the multicasting works, but does the application
> > > need to register for receiving mcasts? If so, could we have lost the
> > > registration when the interface went down?
> > >
> >
> > The multicast does a variety of things which could cause it to fail if
> > the interface goes down. This is a behavior change from 2.4, which
> > doesn't seem to have any negative effects on interface down then up.
> > One thing to note in my testing I used ifconfig eth1 down wait 5 seconds
> > ifconfig eth1 up not ifdown and ifup. Would you try ifconfig to see if
> > it does anything differently?
> >
>
> OK, I did. Something peculiar happened. It got token loss, but never
> noticed that the interface went away.
>
> Oct 29 12:47:19 [WARNING ] [GMI ] Token being retransmitted.
> Oct 29 12:47:20 [WARNING ] [GMI ] Token loss in OPERATIONAL.
> Oct 29 12:47:20 [NOTICE ] [GMI ] entering GATHER state.
> Oct 29 12:47:20 [NOTICE ] [GMI ] SENDING attempt join because this node is ring rep.
> memb_state_gather_enter: res = 44 errno = 11
> Oct 29 12:47:20 [NOTICE ] [GMI ] I am the only member.
> Oct 29 12:47:20 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> Oct 29 12:47:20 [NOTICE ] [CLM ] New Configuration:
> Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.18
> Oct 29 12:47:20 [NOTICE ] [CLM ] Members Left:
> Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.8
> Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.17
> Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.19
> Oct 29 12:47:20 [NOTICE ] [CLM ] Members Joined:
> Oct 29 12:47:20 [NOTICE ] [EVT ] cluster node at 192.168.1.8 down
> Oct 29 12:47:20 [NOTICE ] [EVT ] cluster node at 192.168.1.17 down
> Oct 29 12:47:20 [NOTICE ] [EVT ] cluster node at 192.168.1.19 down
> Oct 29 12:47:20 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> Oct 29 12:47:20 [NOTICE ] [CLM ] New Configuration:
> Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.18
> Oct 29 12:47:20 [NOTICE ] [CLM ] Members Left:
> Oct 29 12:47:20 [NOTICE ] [CLM ] Members Joined:
> Oct 29 12:47:20 [NOTICE ] [EVT ] No channels to send
>
>
> ifconfig eth1 up is not noticed at all.
>
> Mark.
Ok I'll have to try 2.6 and see if I can debug the issue..
Thanks for the report.
More information about the Openais
mailing list