[Openais] Re: just looked at some of the openais code, have some comments

Fri Jun 25 12:06:57 PDT 2004

Steven Dake wrote:
> On Thu, 2004-06-24 at 14:29, Chris Friesen wrote:

>  > --I don't think it makes sense to use mutexes unconditionally.  For
>  > single-threaded apps its only extra overhead.  They should be 
> conditionally
>  > compiled based on compiler flag (REENTRANT or similar).

> In GCC, if you link with -lpthread, mutexes are used.  If you dont link
> with -lpthread, they are not used.  This is probably just a feature of
> gcc, which hurts portablilty somewhat.

Hmm...seems like basic glibc has stub versions for the mutex functions.  Perhaps 
it would be cleaner to have inline functions something like this declared in a 
header somewhere:

inline int sa_unlock(pthread_mutex_t *mutex)
{
#ifdef REENTRANT
	return pthread_mutex_unlock(mutex);
#else
	return 0;
#endif
}

That way, someone looking at the source can at least follow and see that you're 
not really trying to do locking when single-threaded, and the compiler will 
totally optimise it away.  Or if you don't want to do that, there should be 
really big, unmissable documentation that glibc already handles stubbing this.

>  > --It violates design principles to lock/unlock mutexes in different 
> methods, but
>  > here it is even in different conceptual levels of code. Specifically,
>  > saHandleConvert() passing back with mutex locked, then unlocking it 
> in the caller.
>  >
> 
> I am not pleased with the locking approach at the moment.  The reason
> the mutex must be unlocked in the caller of saHandleConvert is to
> protect against races in Initialize/Finalize.  If locking were in the
> actual function, an unfree could make the handle returned back from
> HandleConvert invalid.  Another important reason is that data in the
> instance must be accessed while the lock is held.

I was taught that mutexes are to protect data access.  Conceptually, I don't 
know that it makes sense to use a mutex on a data structure to prevent it from 
being deleted--isn't that what reference counts are for?  For instance, if 
saHandleConvert() incremented a refcount before passing the instance back, then 
it wouldn't need to take the mutex on the instance.  Since refcounts can be done 
using atomic operations, you don't have a chance of blocking.  Of course you'd 
need to take the mutex when actually modifying the data (like saClmFinalize(), 
for instance) but then you'd take it and free it in the same function.

>  > --The unprotected (and unconditional) mutex use means that you can't 
> use any of
>  > the library functions from a signal handler.  I didn't see this 
> documented in
>  > any of the headers.
> 
> I intend to create a FAQ and I'll add this to that document.
> 
> You are correct.  One thing I would wonder though, is could an API user
> really use a signal handler at all with the APIs?  If the handle
> database is in mid-update, and then a signal occurs, the API calls may
> not work appropriately 100% of the time.  Any ideas of supporting this
> sort of feature?

The usual way to handle that is to mask signals while manipulating stuff that 
can't handle being interrupted.  That way you know you will be able to run to 
completion over the critical section.

>  > --When calling saClmDispatch, we may already know (as is the case in 
> the test
>  > code) that there is at least one message.  In that case we could skip 
> the
>  > saPollRetry()/saHandleConvert() calls and go directly to saRecvRetry().
>  >
> 
> How do you always know there is atleast one message?

In testclm.c, the message socket is being monitored in a select() loop.  It only 
calls saClmDispatch() when the socket is readable, so we may as well try to read 
the message from the socket right away.  We may not always know this, but it 
would be nice to have a flag or something.

>  > --In saClmDispatch, do you really want to drop the instance mutex 
> before calling
>  > the callbacks?  What happens if another thread removes the instance 
> and then you
>  > call the callback function which is now invalid?  If you do want to 
> drop the
>  > mutex, maybe the instance should be refcounted or something?
>  >
> 
> This is a bug and has been fixed in the latest bk by copying the data
> handed back in the message to a storage buffer, then unlocking.  Not
> pretty, but it works.

No, my point is that the callback function itself may have been removed.  You've 
got no guarantees that the instance itself is around any more once you drop the 
mutex, since you don't have any refcounts.  You could end up calling into some 
random memory address (that *used* to be the callback function) and treating it 
as executable code.

>  > --The testclm.c demo doesn't have any way to shut down cleanly as far 
> as I can
>  > tell.  If you kill it, then saClmClusterTrackStop() and 
> saClmFinalize() never
>  > get called.  I guess that demos the library cleanup code....
>  >
> 
> The AIS executive fully handles API clients that exit unexpectedly.

Cool.  How does that work?  Some kind of polling?

Chris