[Openais] trouble getting started with corosync API

Dan Davis dan.davis at indexengines.com
Mon Mar 29 13:40:45 PDT 2010


Hi,

I'm a newcomer to working with corosync, and I apologize in advance if this is a noob problem.  I'm trying the API to see whether it would be appropriate to solving some technical problems with getting multiple nodes of our software to cooperate.  Initially to bootstrap a very non-cluster like cooperation mode, but later I'll use more capabilities.

When I start the corosync executive, it appears to work properly (now that I've figured out consensus 1201), but when I start another process that uses the CPG protocol, I get CS_ERR_TRY_AGAIN.  I see no correlated activity in the corosync log when I do this.  Here's the output of the executive:

[dan at ohio corosync-1.2.0]$ sudo corosync -f
Mar 29 16:22:42 corosync [MAIN  ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
Mar 29 16:22:42 corosync [MAIN  ] Corosync built-in features:
Mar 29 16:22:42 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 29 16:22:42 corosync [TOTEM ] Initializing transport (UDP/IP).
Mar 29 16:22:42 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 29 16:22:42 corosync [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Mar 29 16:22:42 corosync [TOTEM ] The network interface [192.168.192.113] is now up.
Mar 29 16:22:42 corosync [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Mar 29 16:22:42 corosync [SERV  ] Service engine loaded: corosync configuration service
Mar 29 16:22:42 corosync [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Mar 29 16:22:42 corosync [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Mar 29 16:22:42 corosync [SERV  ] Service engine loaded: corosync profile loading service
Mar 29 16:22:42 corosync [SERV  ] Service engine loaded: corosync cluster quorum service v0.1

It reaches poll_run() in main - I checked in gdb.

Here's the output of testcpg:

[dan at ohio sando]$ cd corosync-1.2.0/test
[dan at ohio test]$ sudo ./testcpg
Local node id is 71c0a8c0
Could not join process group, error 6
[dan at ohio test]$ 

A major clue is that corosync will not exit on a SIGINT or SIGTERM, even though it ought to do so.  I need to kill -9 to get the executive to go down and try again.  Is this a known issue in 1.2.0 or something likely correlated with my problem?

I've had the same behavior on with my hand-compiled corosync 1.2.0 on a Fedora 7 box and with an rpm on Fedora 12 with cluster-glue installed and started.  Thanks to anyone who can help me get started,

Dan Davis
www.indexengines.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20100329/4c44872c/attachment.htm 


More information about the Openais mailing list