[Openais] Re: defect 169 fixed up (and 172)
Mark Haverkamp
markh at osdl.org
Fri Oct 29 12:53:37 PDT 2004
On Fri, 2004-10-29 at 11:26 -0700, Steven Dake wrote:
> On Fri, 2004-10-29 at 10:15, Mark Haverkamp wrote:
> > On Fri, 2004-10-29 at 10:05 -0700, Mark Haverkamp wrote:
> >
> > >
> > > I'm guessing that the mcast isn't happening from the send side. I'll
> > > add a results check to each of the sendmsg calls in gmi.c and see where
> > > things are going wrong.
> > >
> > > Mark.
> > >
> >
> > OK, here is the results of printing out res:
> >
> >
> >
> >
> > Oct 29 10:08:13 [WARNING ] [GMI ] Token being retransmitted.
> > sendmsg failed errno == 22
> > Oct 29 10:08:13 [WARNING ] [GMI ] The network interface is down.
> > Oct 29 10:08:13 [WARNING ] [GMI ] Token loss in OPERATIONAL.
> > Oct 29 10:08:13 [NOTICE ] [GMI ] entering GATHER state.
> > Oct 29 10:08:13 [NOTICE ] [GMI ] SENDING attempt join because this node is ring rep.
> > memb_state_gather_enter: res = -1 errno = 22
> > mjsend: res = -1, errno = 22
> > Oct 29 10:08:14 [NOTICE ] [GMI ] I am the only member.
> > Oct 29 10:08:14 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:14 [NOTICE ] [CLM ] New Configuration:
> > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.18
> > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Left:
> > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.8
> > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.17
> > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.19
> > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Joined:
> > Oct 29 10:08:14 [NOTICE ] [EVT ] cluster node at 192.168.1.8 down
> > Oct 29 10:08:14 [NOTICE ] [EVT ] cluster node at 192.168.1.17 down
> > Oct 29 10:08:14 [NOTICE ] [EVT ] cluster node at 192.168.1.19 down
> > Oct 29 10:08:14 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:14 [NOTICE ] [CLM ] New Configuration:
> > Oct 29 10:08:14 [NOTICE ] [CLM ] 192.168.1.18
> > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Left:
> > Oct 29 10:08:14 [NOTICE ] [CLM ] Members Joined:
> > Oct 29 10:08:14 [NOTICE ] [EVT ] No channels to send
> > otmcast: res = -1, errno = 22
> >
> >
> >
> >
> > Oct 29 10:08:26 [WARNING ] [GMI ] The network interface is now up.
> > Oct 29 10:08:26 [NOTICE ] [GMI ] entering GATHER state.
> > Oct 29 10:08:26 [NOTICE ] [GMI ] SENDING attempt join because this node is ring rep.
> > memb_state_gather_enter: res = 44 errno = 22
> > Oct 29 10:08:26 [NOTICE ] [GMI ] I am the only member.
> > Oct 29 10:08:26 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:26 [NOTICE ] [CLM ] New Configuration:
> > Oct 29 10:08:26 [NOTICE ] [CLM ] 192.168.1.18
> > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Left:
> > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Joined:
> > Oct 29 10:08:26 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
> > Oct 29 10:08:26 [NOTICE ] [CLM ] New Configuration:
> > Oct 29 10:08:26 [NOTICE ] [CLM ] 192.168.1.18
> > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Left:
> > Oct 29 10:08:26 [NOTICE ] [CLM ] Members Joined:
> > Oct 29 10:08:26 [NOTICE ] [EVT ] Already in config change, Starting over, m 1, c 0
> > Oct 29 10:08:26 [NOTICE ] [EVT ] No channels to send
> > otmcast: res = 364, errno = 11
> >
> >
> > It looks like something successfully was sent. But we're not receiving
> > it. I'm not sure how the multicasting works, but does the application
> > need to register for receiving mcasts? If so, could we have lost the
> > registration when the interface went down?
> >
>
> The multicast does a variety of things which could cause it to fail if
> the interface goes down. This is a behavior change from 2.4, which
> doesn't seem to have any negative effects on interface down then up.
> One thing to note in my testing I used ifconfig eth1 down wait 5 seconds
> ifconfig eth1 up not ifdown and ifup. Would you try ifconfig to see if
> it does anything differently?
>
OK, I did. Something peculiar happened. It got token loss, but never
noticed that the interface went away.
Oct 29 12:47:19 [WARNING ] [GMI ] Token being retransmitted.
Oct 29 12:47:20 [WARNING ] [GMI ] Token loss in OPERATIONAL.
Oct 29 12:47:20 [NOTICE ] [GMI ] entering GATHER state.
Oct 29 12:47:20 [NOTICE ] [GMI ] SENDING attempt join because this node is ring rep.
memb_state_gather_enter: res = 44 errno = 11
Oct 29 12:47:20 [NOTICE ] [GMI ] I am the only member.
Oct 29 12:47:20 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
Oct 29 12:47:20 [NOTICE ] [CLM ] New Configuration:
Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.18
Oct 29 12:47:20 [NOTICE ] [CLM ] Members Left:
Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.8
Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.17
Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.19
Oct 29 12:47:20 [NOTICE ] [CLM ] Members Joined:
Oct 29 12:47:20 [NOTICE ] [EVT ] cluster node at 192.168.1.8 down
Oct 29 12:47:20 [NOTICE ] [EVT ] cluster node at 192.168.1.17 down
Oct 29 12:47:20 [NOTICE ] [EVT ] cluster node at 192.168.1.19 down
Oct 29 12:47:20 [NOTICE ] [CLM ] CLM CONFIGURATION CHANGE
Oct 29 12:47:20 [NOTICE ] [CLM ] New Configuration:
Oct 29 12:47:20 [NOTICE ] [CLM ] 192.168.1.18
Oct 29 12:47:20 [NOTICE ] [CLM ] Members Left:
Oct 29 12:47:20 [NOTICE ] [CLM ] Members Joined:
Oct 29 12:47:20 [NOTICE ] [EVT ] No channels to send
ifconfig eth1 up is not noticed at all.
Mark.
--
Mark Haverkamp <markh at osdl.org>
More information about the Openais
mailing list