[Openais] totempg reentrancy

Steven Dake sdake at mvista.com
Fri Jan 20 16:18:53 PST 2006


On Fri, 2006-01-20 at 15:47 -0800, Mark Haverkamp wrote:
> On Fri, 2006-01-20 at 15:40 -0800, Mark Haverkamp wrote:
> > On Fri, 2006-01-20 at 16:28 -0700, Steven Dake wrote:
> > > On Fri, 2006-01-20 at 15:18 -0800, Mark Haverkamp wrote:
> > > > On Fri, 2006-01-20 at 15:21 -0700, Steven Dake wrote:
> > > > > I found during debugging AMF some strange behavior in the totempg
> > > > > layer.  I tracked it down to the fact that totempg_mcast (or msg_mcast)
> > > > > is not reentrant, meaning it is not possible to call a mcast from a
> > > > > message handler that was delivered a message.
> > > > > 
> > > > > This happens within the AMF quite often, and may also happen within the
> > > > > CKPT and EVT resynchronization.  Muni do you know for sure it happens in
> > > > > ckpt resync?
> > > > > 
> > > > > I think this is something we will have to fix before we finally release
> > > > > 0.70.1.
> > > > > 
> > > > > I have attached a patch which fixes the problem for trunk.  Could we get
> > > > > some review then I'll work up something for picacho?
> > > > > 
> > > > > I have thought through this patch and it appears to solve multiple
> > > > > levels of reentrancy as well, but I could use more eyes and brains to
> > > > > think about the problem.
> > > > 
> > > > How can the code get here and this be true?
> > > > 
> > > > 
> > > > if (reentrant_call == 1) {
> > > > 	goto start_over_reentrant;
> > > > }
> > > > 
> > > > It looks like if reentrant_call is 1 on entry, it goes to
> > > > reentrant_mcast: and reentrant_call is set to zero.
> > > > Otherwise, if reentrant_call is set to one before totemmrp_mcast, it is
> > > > set back to zero just after the call.
> > > > 
> > > > 
> > > put a printf in it and see if its executed :)
> > > 
> > > Yes it took me 3 days to figure out exactly what was happening; its
> > > pretty complicated.
> > > 
> > > Basically the way it happens is this:
> > > 
> > > mcst_mcast is called by one of the service handlers for some request,
> > > maybe from a library.  That service handler then queues a message.  The
> > > message is then delivered.  When that message is delivered, the delivery
> > > handler requests a message to be mcast while the msg_mcast is still
> > > processing a previous request.
> > > 
> > > The problem is, we are already within the mcast routine (which is then
> > > in the msg handler, which then calls the mcast routine), which screws up
> > > all of the fragmentation buffer and other static data that is necessary
> > > to track the state of the totempg.
> > > 
> > > So this patch first "finishes the job" on that last message and then
> > > starts over on the new message requested.  It also seems to now pass
> > > testing.
> > > 
> > > For an interesting test to prove that we are indeed reentrant, put a
> > > printf right after totemmrp_mcast and run amf.  Sometimes it will not be
> > > printed, because amf will on delivery of a message recall the function.
> > 
> > I think that you are saying that the call to totemmrp_mcast can cause
> > mcast_msg to get called again.  If that is true, mcast_msg will see
> > reentrant_call == 1 at the start and goto reentrant_mcast:  Which sets
> > reentrant_call = 0.  I still don't see how we can get to line 801 with
> > reentrant_call == 1.
> 
> Is there a chance that 
> 
> 		res = totemmrp_mcast (iovecs, 3, guarantee);
> reentrant_mcast:
> 		reentrant_call = 0;
> 
> should be
> 
> 		res = totemmrp_mcast (iovecs, 3, guarantee);
> 		reentrant_call = 0;
> reentrant_mcast:
> 

Yes your right.  The way it is now, it would drop the second mcast,
since the goto wouldn't be executed to retry the reentered call.

Regards
-steve

> > _______________________________________________
> > Openais mailing list
> > Openais at lists.osdl.org
> > https://lists.osdl.org/mailman/listinfo/openais




More information about the Openais mailing list