[Openais] corosync-openais-pacemaker cluster unable to start

Steven Dake sdake at redhat.com
Thu Jul 23 19:32:27 PDT 2009


In this backtrace pacemaker is asserting.

I suggest mailing the pacemaker list with your question.

I also suggest using corosync 1.0.0 release, since trunk is under
development.

Regards
-steve

On Thu, 2009-07-23 at 18:28 -0600, Jonathan wrote:
> Hello all:
> 
> I am trying to get a corosync-openais-pacemaker cluster stack installed
> and running on a 2 node cluster.
> 
> I am currently using the latest corosync and openais from the svn
> repository (as of July 22)
> The version of pacemaker is the 1.0 tip from Today (July 23rd)
> 
> Corosync has been crashing randomly on one of my nodes (The same one,
> consistently).After a crash, I shut down the cluster to upgrade
> corosync hoping to resolve the problem.  Now pacemaker is causing a
> crash on startup, and I cannot start either node.
> 
> I have attached the debug output from one node in the cluster, and a backtrace from the crash in GDB.
> 
> Is there something I can do to reset the cluster state so it will start? Or is this a bug?
> 
> Thanks!
> 
> Jonathan
> 
> plain text document attachment (crash.log)
> Jul 22 21:18:22 corosync [MAIN  ] main.c:717 The Platform is missing process priority setting features.  Leaving at default.
> Jul 22 21:18:22 corosync [MAIN  ] main.c:786 Corosync Cluster Engine ('trunk'): started and ready to provide service.
> Jul 22 21:18:22 corosync [MAIN  ] main.c:867 Successfully configured openais services to load
> Jul 22 21:18:22 corosync [MAIN  ] main.c:867 Successfully read main configuration file '/usr/local/etc/corosync/corosync.conf'.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:780 Token Timeout (10000 ms) retransmit timeout (495 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:783 token hold (386 ms) retransmits before loss (20 retrans)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:790 join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:793 downcheck (1000 ms) fail to recv const (50 msgs)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:795 seqno unchanged const (30 rotations) Maximum network MTU 1500
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:799 window size per rotation (50 messages) maximum messages per rotation (20 messages)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:802 send threads (0 threads)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:805 RRP token expired timeout (495 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:808 RRP token problem counter (2000 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:811 RRP threshold (10 problem count)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:813 RRP mode set to none.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:816 heartbeat_failures_allowed (0)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:818 max_network_delay (50 ms)
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:841 HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:319 Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais cluster membership service B.01.01'
> Jul 22 21:18:22 corosync [EVT   ] evt.c:3107 Evt exec init request
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais event service B.01.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais checkpoint service B.01.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais availability management framework B.01.01'
> Jul 22 21:18:22 corosync [MSG   ] msg.c:2404 [DEBUG]: msg_exec_init_fn
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais message service B.03.01'
> Jul 22 21:18:22 corosync [LCK   ] lck.c:1472 [DEBUG]: lck_exec_init_fn
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais distributed locking service B.03.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'openais timer service A.01.01'
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:313 info: process_ais_conf: Reading configure
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:547 info: config_find_init: Local handle: 5912924842687987714 for logging
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional logging options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:599 info: get_config_opt: Found 'on' for option: debug
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:613 info: get_config_opt: Defaulting to 'off' for option: to_file
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:599 info: get_config_opt: Found 'daemon' for option: syslog_facility
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:547 info: config_find_init: Local handle: 3984067077437652995 for service
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:573 info: config_find_next: Processing additional service options...
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:613 info: get_config_opt: Defaulting to 'no' for option: use_logd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:613 info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:404 info: pcmk_plugin_init: CRM: Initialized
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:405 Logging: Initialized pcmk_plugin_init
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:418 info: pcmk_plugin_init: Service: 9
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:419 info: pcmk_plugin_init: Local node id: 0
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:420 info: pcmk_plugin_init: Local hostname: Aries
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:234 info: update_member: Creating entry for node 0 born on 0
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:261 info: update_member: 0x8eb92e8 Node 0 now known as Aries (was: (null))
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:277 info: update_member: Node Aries now has 1 quorum votes (was 0)
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:287 info: update_member: Node 0/Aries is now: member
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 9370 for process stonithd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 9371 for process cib
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 9372 for process lrmd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 9373 for process attrd
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 9374 for process pengine
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:143 info: spawn_child: Forked child 9375 for process crmd
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:540 info: pcmk_startup: CRM: Initialized
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'Pacemaker Cluster Manager'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync extended virtual synchrony service'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync configuration service'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync cluster closed process group service v1.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync cluster config database access v1.01'
> Jul 22 21:18:22 corosync [SERV  ] service.c:206 Service initialized 'corosync profile loading service'
> Jul 22 21:18:22 corosync [MAIN  ] main.c:1010 Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1531 Receive multicast socket recv buffer size (262142 bytes).
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1537 Transmit multicast socket send buffer size (262142 bytes).
> Jul 22 21:18:22 corosync [TOTEM ] totemudp.c:1355 The network interface [172.29.1.1] is now up.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4193 Created or loaded sequence id 740.172.29.1.1 for this ring.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1786 entering GATHER state from 15.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2768 Creating commit token because I am the rep.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1337 Saving state aru 0 high seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3004 Storing new sequence id for ring 2e8
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1825 entering COMMIT state.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1857 entering RECOVERY state.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1886 position [0] member 172.29.1.1:
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1890 previous ring seq 740 rep 172.29.1.1
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1896 aru 0 high delivered 0 received flag 1
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:2003 Did not need to originate any messages in recovery.
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4060 got commit token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:4114 Sending initial ORF token
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3412 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3423 install seq 0 aru 0 high seq received 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:3442 retrans flag count 4 token aru 0 install seq 0 aru 0 0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1558 recovery to regular 1-0
> Jul 22 21:18:22 corosync [TOTEM ] totemsrp.c:1643 Delivering to app 1 to 0
> Jul 22 21:18:22 corosync [CLM   ] clm.c:564 CLM CONFIGURATION CHANGE
> Jul 22 21:18:22 corosync [CLM   ] clm.c:565 New Configuration:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:569 Members Left:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:574 Members Joined:
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2918 Evt conf change 1
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2922 m 0, j 0 l 0
> Jul 22 21:18:22 corosync [LCK   ] lck.c:841 [DEBUG]: lck_confchg_fn
> Jul 22 21:18:22 corosync [MSG   ] msg.c:1085 [DEBUG]: msg_confchg_fn
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:633 notice: pcmk_peer_update: Transitional membership event on ring 744: memb=0, new=0, lost=0
> Jul 22 21:18:22 corosync [CLM   ] clm.c:564 CLM CONFIGURATION CHANGE
> Jul 22 21:18:22 corosync [CLM   ] clm.c:565 New Configuration:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:567     no interface found for nodeid
> Jul 22 21:18:22 corosync [CLM   ] clm.c:569 Members Left:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:574 Members Joined:
> Jul 22 21:18:22 corosync [CLM   ] clm.c:576     no interface found for nodeid
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2918 Evt conf change 0
> Jul 22 21:18:22 corosync [EVT   ] evt.c:2922 m 1, j 1 l 0
> Jul 22 21:18:22 corosync [LCK   ] lck.c:841 [DEBUG]: lck_confchg_fn
> Jul 22 21:18:22 corosync [MSG   ] msg.c:1085 [DEBUG]: msg_confchg_fn
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:633 notice: pcmk_peer_update: Stable membership event on ring 744: memb=1, new=1, lost=0
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:234 info: update_member: Creating entry for node 16850348 born on 744
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:287 info: update_member: Node 16850348/unknown is now: member
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:661 info: pcmk_peer_update: NEW:  .pending. 16850348
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:667 debug: pcmk_peer_update: Node 16850348 has address no interface found for nodeid
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:679 info: pcmk_peer_update: MEMB: .pending. 16850348
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:596 info: ais_mark_unseen_peer_dead: Node Aries was not seen in the previous transition
> Jul 22 21:18:22 corosync [pcmk  ] utils.c:287 info: update_member: Node 0/Aries is now: lost
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:712 debug: pcmk_peer_update: 2 nodes changed
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:1187 info: send_member_notification: Sending membership update 744 to 0 children
> Jul 22 21:18:22 corosync [pcmk  ] plugin.c:1439 CRIT: send_cluster_id: Assertion failure line 1439: local_nodeid != 0
> /usr/local/sbin/aisexec: line 3:  9365 Aborted                 (core dumped) corosync "$@"
> 
> plain text document attachment (debug.session)
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:679 info: pcmk_peer_update: MEMB: .pending. 16850348
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:596 info: ais_mark_unseen_peer_dead: Node Aries was not seen in the previous transition
> Jul 23 17:18:41 corosync [pcmk  ] utils.c:287 info: update_member: Node 0/Aries is now: lost
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:712 debug: pcmk_peer_update: 2 nodes changed
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:1187 info: send_member_notification: Sending membership update 764 to 0 children
> Jul 23 17:18:41 corosync [pcmk  ] plugin.c:1439 CRIT: send_cluster_id: Assertion failure line 1439: local_nodeid != 0
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0xb7c726c0 (LWP 10526)]
> 0xb7fc8424 in __kernel_vsyscall ()
> (gdb) bt
> #0  0xb7fc8424 in __kernel_vsyscall ()
> #1  0xb7e4fb91 in raise () from /lib/libc.so.6
> #2  0xb7e51378 in abort () from /lib/libc.so.6
> #3  0xb722f2d5 in send_cluster_id () from /usr/local/libexec/lcrso/pacemaker.lcrso
> #4  0xb722b362 in pcmk_peer_update () from /usr/local/libexec/lcrso/pacemaker.lcrso
> #5  0x0804be0e in confchg_fn (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, ring_id=0xb72e0664) at main.c:327
> #6  0xb7fabf92 in app_confchg_fn (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, ring_id=0xb72e0664) at totempg.c:348
> #7  0xb7fabea2 in totempg_confchg_fn (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, ring_id=0xb72e0664) at totempg.c:522
> #8  0xb7fab970 in totemmrp_confchg_fn (configuration_type=TOTEM_CONFIGURATION_REGULAR, member_list=0xbfbdbbb4, member_list_entries=1, left_list=0x0, 
>     left_list_entries=0, joined_list=0xbfbdc7b4, joined_list_entries=1, ring_id=0xb72e0664) at totemmrp.c:109
> #9  0xb7fa452a in memb_state_operational_enter (instance=0xb72bf008) at totemsrp.c:1696
> #10 0xb7fa8e26 in message_handler_orf_token (instance=0xb72bf008, msg=0x9a1df04, msg_len=70, endian_conversion_needed=0) at totemsrp.c:3444
> #11 0xb7fab718 in main_deliver_fn (context=0xb72bf008, msg=0x9a1df04, msg_len=70) at totemsrp.c:4168
> #12 0xb7f9f696 in none_token_recv (rrp_instance=0x9a1a4b8, iface_no=0, context=0xb72bf008, msg=0x9a1df04, msg_len=70, token_seq=3) at totemrrp.c:533
> #13 0xb7fa10de in rrp_deliver_fn (context=0x9a1d8a8, msg=0x9a1df04, msg_len=70) at totemrrp.c:1390
> #14 0xb7f9d64f in net_deliver_fn (handle=7749363892505018368, fd=10, revents=1, data=0x9a1d8c8) at totemudp.c:1221
> #15 0xb7f9a5d4 in poll_run (handle=7749363892505018368) at coropoll.c:393
> #16 0x0804d546 in main (argc=2, argv=0xbfbe30e4) at main.c:1050
> (gdb) 
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais



More information about the Openais mailing list