[Openais] totempg reentrancy
Mark Haverkamp
markh at osdl.org
Fri Jan 20 15:40:18 PST 2006
On Fri, 2006-01-20 at 16:28 -0700, Steven Dake wrote:
> On Fri, 2006-01-20 at 15:18 -0800, Mark Haverkamp wrote:
> > On Fri, 2006-01-20 at 15:21 -0700, Steven Dake wrote:
> > > I found during debugging AMF some strange behavior in the totempg
> > > layer. I tracked it down to the fact that totempg_mcast (or msg_mcast)
> > > is not reentrant, meaning it is not possible to call a mcast from a
> > > message handler that was delivered a message.
> > >
> > > This happens within the AMF quite often, and may also happen within the
> > > CKPT and EVT resynchronization. Muni do you know for sure it happens in
> > > ckpt resync?
> > >
> > > I think this is something we will have to fix before we finally release
> > > 0.70.1.
> > >
> > > I have attached a patch which fixes the problem for trunk. Could we get
> > > some review then I'll work up something for picacho?
> > >
> > > I have thought through this patch and it appears to solve multiple
> > > levels of reentrancy as well, but I could use more eyes and brains to
> > > think about the problem.
> >
> > How can the code get here and this be true?
> >
> >
> > if (reentrant_call == 1) {
> > goto start_over_reentrant;
> > }
> >
> > It looks like if reentrant_call is 1 on entry, it goes to
> > reentrant_mcast: and reentrant_call is set to zero.
> > Otherwise, if reentrant_call is set to one before totemmrp_mcast, it is
> > set back to zero just after the call.
> >
> >
> put a printf in it and see if its executed :)
>
> Yes it took me 3 days to figure out exactly what was happening; its
> pretty complicated.
>
> Basically the way it happens is this:
>
> mcst_mcast is called by one of the service handlers for some request,
> maybe from a library. That service handler then queues a message. The
> message is then delivered. When that message is delivered, the delivery
> handler requests a message to be mcast while the msg_mcast is still
> processing a previous request.
>
> The problem is, we are already within the mcast routine (which is then
> in the msg handler, which then calls the mcast routine), which screws up
> all of the fragmentation buffer and other static data that is necessary
> to track the state of the totempg.
>
> So this patch first "finishes the job" on that last message and then
> starts over on the new message requested. It also seems to now pass
> testing.
>
> For an interesting test to prove that we are indeed reentrant, put a
> printf right after totemmrp_mcast and run amf. Sometimes it will not be
> printed, because amf will on delivery of a message recall the function.
I think that you are saying that the call to totemmrp_mcast can cause
mcast_msg to get called again. If that is true, mcast_msg will see
reentrant_call == 1 at the start and goto reentrant_mcast: Which sets
reentrant_call = 0. I still don't see how we can get to line 801 with
reentrant_call == 1.
--
Mark Haverkamp <markh at osdl.org>
More information about the Openais
mailing list