[Openais] totempg reentrancy
Steven Dake
sdake at mvista.com
Fri Jan 20 16:18:53 PST 2006
On Fri, 2006-01-20 at 15:47 -0800, Mark Haverkamp wrote:
> On Fri, 2006-01-20 at 15:40 -0800, Mark Haverkamp wrote:
> > On Fri, 2006-01-20 at 16:28 -0700, Steven Dake wrote:
> > > On Fri, 2006-01-20 at 15:18 -0800, Mark Haverkamp wrote:
> > > > On Fri, 2006-01-20 at 15:21 -0700, Steven Dake wrote:
> > > > > I found during debugging AMF some strange behavior in the totempg
> > > > > layer. I tracked it down to the fact that totempg_mcast (or msg_mcast)
> > > > > is not reentrant, meaning it is not possible to call a mcast from a
> > > > > message handler that was delivered a message.
> > > > >
> > > > > This happens within the AMF quite often, and may also happen within the
> > > > > CKPT and EVT resynchronization. Muni do you know for sure it happens in
> > > > > ckpt resync?
> > > > >
> > > > > I think this is something we will have to fix before we finally release
> > > > > 0.70.1.
> > > > >
> > > > > I have attached a patch which fixes the problem for trunk. Could we get
> > > > > some review then I'll work up something for picacho?
> > > > >
> > > > > I have thought through this patch and it appears to solve multiple
> > > > > levels of reentrancy as well, but I could use more eyes and brains to
> > > > > think about the problem.
> > > >
> > > > How can the code get here and this be true?
> > > >
> > > >
> > > > if (reentrant_call == 1) {
> > > > goto start_over_reentrant;
> > > > }
> > > >
> > > > It looks like if reentrant_call is 1 on entry, it goes to
> > > > reentrant_mcast: and reentrant_call is set to zero.
> > > > Otherwise, if reentrant_call is set to one before totemmrp_mcast, it is
> > > > set back to zero just after the call.
> > > >
> > > >
> > > put a printf in it and see if its executed :)
> > >
> > > Yes it took me 3 days to figure out exactly what was happening; its
> > > pretty complicated.
> > >
> > > Basically the way it happens is this:
> > >
> > > mcst_mcast is called by one of the service handlers for some request,
> > > maybe from a library. That service handler then queues a message. The
> > > message is then delivered. When that message is delivered, the delivery
> > > handler requests a message to be mcast while the msg_mcast is still
> > > processing a previous request.
> > >
> > > The problem is, we are already within the mcast routine (which is then
> > > in the msg handler, which then calls the mcast routine), which screws up
> > > all of the fragmentation buffer and other static data that is necessary
> > > to track the state of the totempg.
> > >
> > > So this patch first "finishes the job" on that last message and then
> > > starts over on the new message requested. It also seems to now pass
> > > testing.
> > >
> > > For an interesting test to prove that we are indeed reentrant, put a
> > > printf right after totemmrp_mcast and run amf. Sometimes it will not be
> > > printed, because amf will on delivery of a message recall the function.
> >
> > I think that you are saying that the call to totemmrp_mcast can cause
> > mcast_msg to get called again. If that is true, mcast_msg will see
> > reentrant_call == 1 at the start and goto reentrant_mcast: Which sets
> > reentrant_call = 0. I still don't see how we can get to line 801 with
> > reentrant_call == 1.
>
> Is there a chance that
>
> res = totemmrp_mcast (iovecs, 3, guarantee);
> reentrant_mcast:
> reentrant_call = 0;
>
> should be
>
> res = totemmrp_mcast (iovecs, 3, guarantee);
> reentrant_call = 0;
> reentrant_mcast:
>
Yes your right. The way it is now, it would drop the second mcast,
since the goto wouldn't be executed to retry the reentered call.
Regards
-steve
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.osdl.org
> > https://lists.osdl.org/mailman/listinfo/openais
More information about the Openais
mailing list