[Openais] Re: Library core dump using evt.

Steven Dake sdake at mvista.com
Mon Jan 9 00:56:29 PST 2006


On Sun, 2006-01-08 at 13:27 -0600, Muni Bajpai wrote:
> Hey Steve,Mark
> 
>  
> 
> Kristen found an issue where the library dumped core in the process of
> shutting down. As of now we can’t reproduce it but the issue was in
> saHandleInstancePut being called from saEvtEventAllocate. Looks like
> the handle was corrupted or the database closed. In either case
> handleDatabase->handles[handle].check caused the issue and my best
> guess is accessing the array after its been deallocated.
> 
>  
> 
> There is  a check in saHandleInstancePut that I added to the other to
> access functions. This seems like a good error check anyways. Now
> obviously the calling functions should check the return code which is
> not the case but at least we won’t have invalid memory access.
> 

Well its been awhile since we have found any sorts of bug in the lib
directory.  Good work on the bug find Muni :)  Question, was the
application multithreaded?  The handle array should always be valid,
unless HandleInstanceDestructor has been called previously.  This only
happens when the reference count reaches zero.

Another option is that the reference counting could be putting when it
shouldn't (perhaps some error condition that is incorrectly handled in
the library).

Mark could you look over the evt library code and see if you see any
problem?

I'd like to fix the bug rather then apply the workaround you submitted
Muni...  This way we know its fixed...

The first rule is that put should never be called, unless a get is
called on it first.  The second rule is that get and put are matched on
all calls.  The third rule is that after a create, if a get fails, a
destroy should take place on the handle and an error returned.  I know
its kind of onerous, but threads suck :)  If this is wrong, all kinds of
things could break.

Could you give more information on the failure?

Was a multi-threaded application signaled to exit during runtime?  

One other possibility is that the application itself is corrupting the
handle database.  Have you tried running your apps through valgrind to
determine any invalid memory accesses?

It would be interesting to know which handle database put was the result
of the crash - do you know the answer to this?  (ie was it the channel
database, or the evt database).

Finally mark I think this code is wrong:
    error = saHandleCreate(&event_handle_db, sizeof(*edi),
            eventHandle);
    if (error != SA_AIS_OK) {
        goto alloc_put2;
    }

   error = saHandleInstanceGet(&event_handle_db, *eventHandle,
                    (void*)&edi);
    if (error != SA_AIS_OK) {
        goto alloc_put2;

^^^^^ this could looks suspect - shouldn't it be goto alloc_put3 which
destroys the created event handle and puts the other two handle and
returns an error?  At minimum this is a leak during error condition but
probably not the cause of Kristen's crash...

    }


......

alloc_put2:
    saHandleInstancePut (&evt_instance_handle_db,
eci->eci_instance_handle);
alloc_put1:
    saHandleInstancePut (&channel_handle_db, edi->edi_channel_handle);

More thinking out loud the event could be closed before the channel is
closed.  The specs allow this, but only after all references in the
library to the channel and evt handle are put.

Does any of this speculation sound like something that could happen in
your application?

Regards
-steve

>  
> 
> Thanks
> 
>  
> 
> Muni
> 
>  
> 
> diff -uNr --exclude=svn --exclude=.svn --exclude=SCCS
> --exclude=BitKeeper --exclude=ChangeSet --exclude=init
> --exclude=LICENSE --exclude=Makefile --exclude=man
> --exclude=README.devmap --exclude=SECURITY --exclude=TODO
> --exclude=CHANGELOG --exclude=conf --exclude=loc
> --exclude=Makefile.samples --exclude=QUICKSTART --exclude=.cdtproject
> --exclude=.project --exclude=nortel.patch
> openais/branches/picacho/lib/util.c picacho_patch/lib/util.c
> 
> --- openais/branches/picacho/lib/util.c 2006-01-06 17:23:32 -06:00
> 
> +++ picacho_patch/lib/util.c    2006-01-08 13:23:15 -06:00
> 
> @@ -608,6 +608,11 @@
> 
>         uint32_t handle = inHandle & 0xffffffff;
> 
>  
> 
>         pthread_mutex_lock (&handleDatabase->mutex);
> 
> +
> 
> +       if (handle >= (SaUint64T)handleDatabase->handleCount) {
> 
> +                       error = SA_AIS_ERR_BAD_HANDLE;
> 
> +                       goto error_exit;
> 
> +       }
> 
>  
> 
>         if (check != handleDatabase->handles[handle].check) {
> 
>                 error = SA_AIS_ERR_BAD_HANDLE;
> 
> @@ -673,6 +678,11 @@
> 
>         uint32_t handle = inHandle & 0xffffffff;
> 
>  
> 
>         pthread_mutex_lock (&handleDatabase->mutex);
> 
> +
> 
> +       if (handle >= (SaUint64T)handleDatabase->handleCount) {
> 
> +                       error = SA_AIS_ERR_BAD_HANDLE;
> 
> +                       goto error_exit;
> 
> +       }
> 
>  
> 
>         if (check != handleDatabase->handles[handle].check) {
> 
>                 error = SA_AIS_ERR_BAD_HANDLE;
> 
> 




More information about the Openais mailing list