[Openais] cman does not start ... corosync died
Christine Caulfield
ccaulfie at redhat.com
Mon Mar 15 06:06:14 PDT 2010
On 15/03/10 12:44, Christian Brandes wrote:
> Hi all!
>
> At the moment I am trying to run RH-Cluster, but cman won't start.
> I got error messages from corosync.
>
> Do you have an idea what's wrong?
>
> Versions:
> Ubuntu 10.04 Alpha-3
> redhat-cluster-suite 3.0.2-2ubuntu2
> cman 3.0.2-2ubuntu2
> corosync 1.2.0-0ubuntu1
>
> /etc/init.d/cman start
> Starting cluster:
> Global setup... [ OK ]
> Loading kernel modules... [ OK ]
> Mounting configfs... [ OK ]
> Setting network parameters... [ OK ]
> Starting cman... corosync died: Could not read cluster configuration
>
> I generated /etc/cluster/cluster.conf with system-config-cluster:
> <?xml version="1.0" ?>
> <cluster alias="ubu" config_version="4" name="ubu">
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="ubu1-24" nodeid="1" votes="1">
> <fence/>
> </clusternode>
> <clusternode name="ubu2-24" nodeid="2" votes="1">
> <fence/>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_manual" name="manual"/>
> </fencedevices>
> <rm>
> <failoverdomains/>
> <resources/>
> </rm>
> </cluster>
>
> /etc/corosync/corosync.conf:
> totem {
> version: 2
> token: 3000
> token_retransmits_before_loss_const: 10
> join: 60
> consensus: 4800
> vsftype: none
> max_messages: 20
> clear_node_high_bit: yes
> secauth: off
> threads: 0
> rrp_mode: none
> interface {
> ringnumber: 0
> bindnetaddr: 192.168.24.221
> mcastaddr: 226.94.1.1
> mcastport: 5405
> }
> }
>
> amf {
> mode: disabled
> }
>
> service {
> ver: 0
> name: pacemaker
> }
>
> aisexec {
> user: root
> group: root
> }
>
> logging {
> fileline: off
> to_stderr: yes
> to_logfile: no
> to_syslog: yes
> syslog_facility: daemon
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> tags: enter|leave|trace1|trace2|trace3|trace4|trace6
> }
> }
>
> /var/log/syslog:
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Unloading all Corosync service engines.
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync configuration service
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync profile loading service
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
> Mar 15 11:52:14 ubu1 corosync[3904]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
> Mar 15 11:52:14 ubu1 corosync[3904]: [MAIN ] Corosync Cluster Engine exiting with status -1 at main.c:158.
> Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
> Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Corosync built-in features: nss
> Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] Initializing transport (UDP/IP).
> Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] The network interface [192.168.24.221] is now up.
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service failed to load 'pacemaker'.
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync configuration service
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync profile loading service
> Mar 15 11:52:14 ubu1 corosync[4043]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> Mar 15 11:52:14 ubu1 corosync[4043]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Mar 15 11:52:14 ubu1 corosync[4043]: [MAIN ] Completed service synchronization, ready to provide service.
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Corosync built-in features: nss
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Successfully parsed cman config
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Successfully configured openais services to load
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] parse error in config: The consensus timeout parameter (4800 ms) must be atleast 1.2 * token (12000 ms).
> Mar 15 11:52:32 ubu1 corosync[4077]: [MAIN ] Corosync Cluster Engine exiting with status -9 at main.c:1359.
>
> I see:
> consensus (4800 ms) must be atleast 1.2 * token (12000 ms)
> But token is 3000!
>
It's a known bug in that version of corosync. You'll need to manually
increase consensus so that it is greater that 4800
Chrissie
More information about the Openais
mailing list