[Openais] Regarding performance

Steven Dake sdake at mvista.com
Wed Jun 23 12:31:21 PDT 2004


Yixiong

On Wed, 2004-06-23 at 11:47, Zou, Yixiong wrote:
> Hi Steve,
> 
> The reason why I am asking this question is because CGL 3.0 is calling 
> for the Data checkpoint service performance to be at least 1200 tps. 
> I think that means I need to be able to call the "checkpoint write" 1200 
> times.  The multi-threaded client does have an advantage since that
> it will increase the throughput, but it would not reduce the latency
> for every transaction.  
> 
These (very informal) tests run on two Intel boards (cpi1), Radisys 100
mbit switch give the following results:

I configured ais to run on two nodes (192.168.1.93, 192.168.1.98)
without optimization on 100mbit non-lossy network without optimization
of the code (no -O3 option).

If I configure the testckpt program as follows:

DATASIZE 1000 (write 1000 bytes per checkpoint write)
10000 invocations

I get:

Writing checkpoint loop 9999

real    0m4.683s

Which is
2,135,383 bytes per second
2135.38 transactions per second

Now, if I configure the testckpt program as follows:
DATASIZE 100000 (writes 100000 bytes per checkpoint write)
10000 invocations

I get:

Writing checkpoint loop 9999

real    1m39.832s

Which is
10,016,828 bytes per second
103.27 transactions per second

The first test above doesn't utilize the full bandwidth of the network
because only a small portion of the network bandwidth is sent on each
token rotation.  The second test above utilizes the full network
bandwidth, even without threads.

This test is not optimal, because not all threads have finished when the
program exits.  So I'll use 9950 invocations in the calculations:

With threads and small checkpoint data the most TPS are possible:
THREADS 45
DATASIZE 1000
10000 invocations

I get

Thread 35: Attempt 9967: error 1
real    1m2.139s

7,205,619.66 bytes per second
7205 transactions per second

Latency is an issue and room for improvement in the current
implementation :)  Performance can of course, be improved.

It sounds like your performance numbers are not far off, from your
descriptions.  Could you tell me, when you ctrl-C out of the executive
(in a two node setup) what the line Re-Mcasts says?  That is the count
of how many messages had to be resent by the group ordering protocol. 
fyi The number could be different on each node..


> Looking into the code, I found that the checkpoint write is not immediately
> returned back to the client either.  If I understood it correctly, the daemon 
> does a message_handler_req_exec_ckpt_sectionwrite() and then reply back to 
> the library.  This behavior seems to be different from
> what we've talked about earlier that you mentioned that the call is 
> returned once the message is "self-delievered".  Am I correct? 
> 
The message is returned once the group messaging layer self delivers the
message (after ordering it and applying the principles of virtual
synchrony).  The call path is something like:

1. The executive gets a request from the library
2. poll_handler_libais_deliver() delivers the message to the library
handler
3. The library handler message_handler_req_lib_ckpt_sectionwrite sends
to the cluster MESSAGE_REQ_EXEC_CKPT_SECTIONWRITE via the group
messaging interface
4. The group messaging interface orders the messages using the ring
ordering and recovery protocol
5. Messages are delivered in order to the function main.c/deliver_fn
6. The executive checkpoint write message calls the handler
message_handler_req_exec_ckpt_sectionwrite
7. If the node sent the message, it delivers a response back to the
library

This technique ensures that checkpoint reads and writes in the cluster
are delivered in agreed order.  There is alot room for optimization and
improvement that could improve performance when not using the
WR_ALL_REPLICA flag.  Currently the checkpoint service treats all
transactions as WR_ALL_REPLICA.



> ------------------------------------------------------------------------
> 
> Yixiong Zou (yixiong.zou at intel.com)
> 
> (626) 443-0100
> 
> All views expressed in this email are those of the individual sender. 
> 
> 
> 
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake at mvista.com] 
> > Sent: Tuesday, June 22, 2004 5:19 PM
> > To: Zou, Yixiong
> > Cc: openais at lists.osdl.org
> > Subject: RE: [Openais] Your Contributions
> > 
> > 
> > 
> > > Second, I modified the testckpt and let it run 5000 loops and it
> > > took 84 seconds to finish.  This seems to be a lot slower 
> > than you originally
> > > claimed.  Did I do anything wrong?  
> > > 
> > 
> > testckpt runs one thread.  Because the API is blocking, 
> > checkpoints must
> > wait before a new checkpoint can be started.  The service itself,
> > however, can still deliver checkpoint writes as claimed.
> > 
> > In this case, the size of the checkpoint defines the rate at which
> > checkpoint data can be transmitted by one API call.  If a 
> > checkpoint is
> > 45 (the window) * 1472 (the packet size), the checkpoint will consume
> > one entire flow control for the token which will maximize I/Os.
> > 
> > try an informal time ./ckptstress and see what you get.  This test
> > creates lots of threads and runs them multiple times.  You 
> > can tune the
> > numbers in the test to what you desire.
> > 
> > Try creating small checkpoints (perhaps 900 bytes) in the ckptstress
> > program and post your transactions per second and mb/sec for your
> > network.
> > 
> > Another point to bring up is lossy networks.  If your network 
> > is lossy,
> > performance will be poor.  This is just a fact of life :)  Fortunately
> > most networks are not lossy.  When you press ctrl-C out of the
> > executive, it should print the "remcast" line.  This 
> > describes how many
> > times a message had to be remulticasted because a node didn't receive
> > it.
> > 
> 




More information about the Openais mailing list