[Openais] totempg reentrancy

Mark Haverkamp markh at osdl.org
Fri Jan 20 15:40:18 PST 2006


On Fri, 2006-01-20 at 16:28 -0700, Steven Dake wrote:
> On Fri, 2006-01-20 at 15:18 -0800, Mark Haverkamp wrote:
> > On Fri, 2006-01-20 at 15:21 -0700, Steven Dake wrote:
> > > I found during debugging AMF some strange behavior in the totempg
> > > layer.  I tracked it down to the fact that totempg_mcast (or msg_mcast)
> > > is not reentrant, meaning it is not possible to call a mcast from a
> > > message handler that was delivered a message.
> > > 
> > > This happens within the AMF quite often, and may also happen within the
> > > CKPT and EVT resynchronization.  Muni do you know for sure it happens in
> > > ckpt resync?
> > > 
> > > I think this is something we will have to fix before we finally release
> > > 0.70.1.
> > > 
> > > I have attached a patch which fixes the problem for trunk.  Could we get
> > > some review then I'll work up something for picacho?
> > > 
> > > I have thought through this patch and it appears to solve multiple
> > > levels of reentrancy as well, but I could use more eyes and brains to
> > > think about the problem.
> > 
> > How can the code get here and this be true?
> > 
> > 
> > if (reentrant_call == 1) {
> > 	goto start_over_reentrant;
> > }
> > 
> > It looks like if reentrant_call is 1 on entry, it goes to
> > reentrant_mcast: and reentrant_call is set to zero.
> > Otherwise, if reentrant_call is set to one before totemmrp_mcast, it is
> > set back to zero just after the call.
> > 
> > 
> put a printf in it and see if its executed :)
> 
> Yes it took me 3 days to figure out exactly what was happening; its
> pretty complicated.
> 
> Basically the way it happens is this:
> 
> mcst_mcast is called by one of the service handlers for some request,
> maybe from a library.  That service handler then queues a message.  The
> message is then delivered.  When that message is delivered, the delivery
> handler requests a message to be mcast while the msg_mcast is still
> processing a previous request.
> 
> The problem is, we are already within the mcast routine (which is then
> in the msg handler, which then calls the mcast routine), which screws up
> all of the fragmentation buffer and other static data that is necessary
> to track the state of the totempg.
> 
> So this patch first "finishes the job" on that last message and then
> starts over on the new message requested.  It also seems to now pass
> testing.
> 
> For an interesting test to prove that we are indeed reentrant, put a
> printf right after totemmrp_mcast and run amf.  Sometimes it will not be
> printed, because amf will on delivery of a message recall the function.

I think that you are saying that the call to totemmrp_mcast can cause
mcast_msg to get called again.  If that is true, mcast_msg will see
reentrant_call == 1 at the start and goto reentrant_mcast:  Which sets
reentrant_call = 0.  I still don't see how we can get to line 801 with
reentrant_call == 1.
-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list