[cgl_discussion] AEM analysis summary

Frederic Rossi (LMC) Frederic.Rossi at ericsson.ca
Thu Apr 3 07:34:17 PST 2003

Hi Dave,

Your document is pretty interesting and your analysis of AEM is mostly
correct. But I have some first comments and questions.
Most of them are to clarify the situation about AEM and the other
ones are to understand your proposal.

Dave Olien wrote:

     > AEM implements an event delivery mechanism that meets many of OSDL's
     > requirements for such a mechanism.  However, the implementation seems
     > highly optimized for a specific set of applications.

AEM is really not optimized, but it targets telecom and distributed
applications. This is what is needed, no??

On the other side, it exports a so simple API to the user compared to
most mechanisms that it can be used for every-day programming tasks.

AEM is generic. It can be used for any kind of event notification.
What is provided (socket and timers) is really a small subset of
what it is possible to do:

For example you could use AEM for device driver initialization (then
the callback is the initialization code in user space), for overload
notification (a callback is executed when the system load reach some
value), memory location monitoring. Since AEM manage the asynchronous
execution of processes, it is also possible to provide load balancing
at the process level. For example a system could decide not to execute
some processes (event handlers) depending on the system load.
AEM can be use either for streaming or for sporadic flow of data.

     > Since the optimization
     > is done intrusively in the kernel, the chance for mainline acceptance
     > is zero. As such, it is more intrusive into the kernel than is
     > for OSDL.

What is the acceptable level of "intrusiveness" for OSDL???

     > There are also indications that the AEM implementation is still
     > of a prototype, as evidenced by some unused code in the kernel.

No. AEM is a research project and a work in progress. Plenty of
work is to be done and all contributions are welcome.

     > A less optimized implementation that accomplishes most of what AEM
     > implements could be implemented using user-mode libraries and

Less optimized? you mean less capable.
Because epoll is not a generic mechanism it is a replacement for
select().Also, I haven't read anything saing that epoll would evolved
into a generic event mechanism. I'm not even sure it would be desirable.

For example, how to provide asynchronous notification with epoll when
a socket is closing or a socket state is changing? A generic epoll would
become as intrusive as AEM.

AEM is really generic. Adding new notifications with AEM is just a
matter of adding a wait-queue to the right structure and a wakeup to
the right place into the code. Basically two lines of code.

The corresponding system call (or whatever) is located at the kernel
level but doesn't polute the kernel structures. I admit this is some
code to write but it could be easily pushed into modules. In the
requirement document I have written for Low Level Asyn Events I proposed
a different API for AEM system calls which could fit very well with
modules (this is ~2000 lines out of the patch). I still have not
received any comment on this.

On the other side, we really think that AEM is a good complement to
other mechanisms, either select or epoll. So please don't see it as a
replacement to epoll, or epoll as an alternative to AEM.

     > It would be useful to have source code for a few simple applications
     > that use the AEM API.  This could include code for measuring
     > code used for regression testing, etc.

You can download example source code from

There are simple servers using AEM, AEM+select, AEM+epoll,
and timers.

     > For examples:
     > 2. event delivery in AEM is in many ways a variation on signal
     >         But it is a mechanism that is in addition to standard 

This is very practical actually. It permits to implement single-threaded
servers for example. Some event handlers can be executed inside the same
context as the main process. This is usefull during a process startup
for initialization.

     > 4. Soft Realtime Scheduling Priority
     >         The AEM boosts the scheduling priority of whatever thread
     >         context is the target of an event delivery, to ensure the
     >         event is handled in a quickly.
     >         Linux 2.5 has re-written the scheduler.  This priority boost
     >         code in AEM will need to be re-written.  The 2.5 O(1)
     >         now requires tasks be moved between priority queues, and
     >         migrated between per-process queues when its priority 
     >         Really, AEM should probably call an exported kernel function,
     >         as "set_user_nice()" (which is an EXPORTED kernel function).

You are right, AEM needs to be rewritten for 2.5. The current scheduling
scheme works, but is not elegant and I would like to find a better way
of doing it.

The reason IS the following:

AEM uses an extra member called "srt_priority" to change process
priorities at run time. On 2.4, when a process is selected, each
goodness value is ponderated with this srt_priority. This is nice
because it permits to change the srt_priority at a very high rate
without impacting the process priority itself. This is really what is

On the other side, it breaks the geometric series and the convergence
which is used to describe the evolution of the counter value. This is
what I dislike. Except for the goodness value, priorities in Unix are
fundamentally static and I want something dynamic that can be changed at
a high rate. This occurs frequently with the async-read on a socket for
example or for any kind of streaming data.

Using set_user_nice() or related is really not acceptable for this purpose.

I suspect it will be easier to achieve this on 2.5.

     > The AEM event delivery mechanism is built on a variation of signal
     > delivery.
     > But, it is a mechanism IN ADDITION to signals.  A consequence is that
     > applications using events will have many of the same application
     > reentrancy
     > issues as signals.  For example, libc functions that are
     > "async-signal" safe,
     > often achieve that by blocking signals around critical code in the
     > library function.  This is important for library functions that 
may be
     > called from both an application's mainline code and from signal
     > in that application.
     > AEM implements a system call to hold off delivery of events, similar
     > to blocking signals.  But none of the functions in libc have been
     > adapted to use that mechanism.  Hence, no libc functions are
     > safe.  The workaround for this is for applications to block event
     > delivery around calls to functions (e.g. printf()) that are called
     > both mainline code and from event handlers in that application.

This is only true when executing inside the main flow of execution since
handlers cannot be interrupted by other events.

The only solution I see for the moment it to suspend event delivery.

     > The same is true for a user-level spinlock that is acquired both 
in an
     > event handler and in the mainline aplication. Prior to acquiring the
     > events should be blocked.  For user-mode spinlocks (futex), it's
     > unfortunate
     > that a system call needs to be made.  It defeats the purpose of 
     > user-mode spinlocks.

Yes this is really bad. But this is true only on the single-threaded
case, no?

Since we know this in advanced (at registration), it  might be
reasonable not to use locks for these type of event handlers (those that
breaks the main flow of execution). We already know that if we block
inside this type of handler we may block the entire process. This is the
same problem.

     > A proposed implementation of event notification should make use of
     > One way would be for the mainline application threads to have a
pool of
     > "epoll" threads.  These epoll threads would share memory with the
     > mainline thread, and would do blocking calls on epoll() to wait
for the
     > occurence of events.  When an event occurs, the poll thread can
     > the event data, and execute a callback function in the shared-memory
     > context of the application's mainline thread.

Using a mechanism based on epoll is quite interesting. This permits to
take benefits of its capabilities and its wide acceptance in the
community. But don't you think its limitations could be a problem for
a generic event notification mechanism?

It is already the goal of AEM to manage event and related processes on
behalf of the user applications. What you are proposing is to push this
back into user-space, right?.
This is  not really different to what we already had when using AIO and
multi-threaded applications, am I wrong?

I have some questions on this. This is really to make me understand
your idea. So these are mostly questions that came up to my mind. I'm
sorry, feel free to answer what you think is necessary.

Do you mean one epoll-thread per process or one epoll-thread per event?
per system? It is important since the issues are not the same.
Especially regarding application reliability and node robustness.

What kind of threads (co-routines I hope not)? Would it be possible to
have memory protection in this case?

What happens if a management thread crashes?? It seems to me there is
not much choice than reloading the entire application (in order to
re-register all events).

On the other side it seems to me this kind of implementation is
intusive w.r.t to the applications. It is clearly more than just
an API, since the application needs to be tightly linked with an
"event library" and extra management threads.

One issue is related to software upgrade?? It seems to me
difficult to provide a granularity at the process level, and
that there is no other choice than upgrading the entire applications.

How easy could it be to manage the addition of new events at run-time?

When using epoll+threads you have to ensure synchronization between
threads. This is not the case with AEM which is already safe w.r.t event
handlers. Only data structures synchronization is required.

     > Poll threads could run with an elevated scheduling priority to ensure
     > they service events quickly.

This is one possible solution.

But would it be possible to provide thread priorities depending on event
priorities? It seems to me that priorities in this case are associated
with threads and not with events.

In AEM it is the event that boost the process priority and not the inverse.

If we have plenty of management threads with the same high priorities,
how does this guaranty that a high priority event is selected before any
low priority events (I mean in the system) ?

It seems difficult to maintain sequentiality of event delivery w.r.t to
event activation. The only way to make it work is for each thread
to poll extensivelly on its descriptor once it has been activated.

     > This eliminates the need to do a system call to block
     > event delivery prior to acquiring a mutex.  If the application is
     > futex, any contention on a lock between the mainline application
and an
     > event delivery will be efficiently resolved.  Whichever thread
     > or poll thread) encounters a futex that is already locked, will block
     > and release the processor until the lock becomes available.  This
     > the thread that already holds the lock in a runnable state, and
able to
     > proceed until that thread releases the lock.

We can also solve this problem simply by adding a third type of object
to AEM looking like a thread.  So that we could execute an event handler
inside the main flow of execution, inside a process or inside a thread.
This later one is still to be done and would solve the problem behind
the time it takes to create a short-lived process and prevent the use of
single-threaded handlers.

     > Requirements Analysis
     > There are three sources for event delivery requirements:
     > The OSDL CGL Requirements for Low-Level Asynchronous Events, by
     > Frederic Rossi
     > The OSDL Cluster Event Notification API, by Joe DiMartino
     > 1. An Event publication and subscription interface, that
     >         supports event notification through callbacks or polling.
     >         Ideally, it should be possible to implement similar 
     >         in both user and kernel space.
     > 2. Event Topics
     > The OSDL Carrier Grade Linux Working Grouop Clustering
Requirements and
     > SAForum Application Interface Specification Comparison, by Peter
     > Badovinatz.
     > _______________________________________________
     > cgl_discussion mailing list
     > cgl_discussion at lists.osdl.org
     > http://lists.osdl.org/mailman/listinfo/cgl_discussion

More information about the cgl_discussion mailing list