[Openais] When reassigning workload, the "old" component is not set to standby

Ola Lundqvist ola.lundqvist at tietoenator.com
Tue Sep 5 06:47:29 PDT 2006


Hi

Hans Feldt wrote:
> Looks like it works "like a charm". First node gets the active
You are right. It actually works like it should. I thought it took over
the workload. I simply misread it.

> assignment. Second node gets the standby assignment. There is an AMF
> feature called 'auto-adjust' that I think would do what you expect
> (switch to preferred). That feature is __not__ implemented in AMF.

I see. Is there any way that I can trigger a switchover without killing
processes on either side?

> Try the command (well hidden...):
> 
> $ pkill -USR2 aisexec

Thanks a lot! This is very good information for me to use when testing.

> That will give you the current AMF state.

Regards,

// Ola

> Regards,
> Hans
> 
> 
> Ola Lundqvist wrote:
>> Hi
>>
>> The following was done:
>> * Start ais on obf-com-2
>>   -> csi assigned to the component on that node
>>   "Mainstart works like a charm!" below in logs.
>> * Start ais on obf-com-1
>>   -> csi assigned to component on node 1.
>>
>> But I'm expecting the csi to be removed from node 2 when this happens.
>> Shouldn't it?
>>
>> obf-com-2:~# /opt/ais/sbin/aisexec -f
>> Sep  5  9:38:46.627394 [main.c:0409] AIS Executive Service RELEASE
>> 'trunk'
>> Sep  5  9:38:46.627987 [main.c:0410] Copyright (C) 2002-2006 MontaVista
>> Software, Inc and contributors.
>> Sep  5  9:38:46.628044 [main.c:0411] Copyright (C) 2006 Red Hat, Inc.
>> Sep  5  9:38:46.628098 [service.c:0219] openais component openais_cpg
>> loaded.
>> Sep  5  9:38:46.628152 [service.c:0123] Registering service handler
>> 'openais cluster closed process group service v1.01'
>> Sep  5  9:38:46.628205 [service.c:0219] openais component openais_cfg
>> loaded.
>> Sep  5  9:38:46.628259 [service.c:0123] Registering service handler
>> 'openais configuration service'
>> Sep  5  9:38:46.628312 [service.c:0219] openais component openais_msg
>> loaded.
>> Sep  5  9:38:46.628366 [service.c:0123] Registering service handler
>> 'openais message service B.01.01'
>> Sep  5  9:38:46.628419 [service.c:0219] openais component openais_lck
>> loaded.
>> Sep  5  9:38:46.628475 [service.c:0123] Registering service handler
>> 'openais distributed locking service B.01.01'
>> Sep  5  9:38:46.628528 [service.c:0219] openais component openais_evt
>> loaded.
>> Sep  5  9:38:46.628581 [service.c:0123] Registering service handler
>> 'openais event service B.01.01'
>> Sep  5  9:38:46.628634 [service.c:0219] openais component openais_ckpt
>> loaded.
>> Sep  5  9:38:46.628687 [service.c:0123] Registering service handler
>> 'openais checkpoint service B.01.01'
>> Sep  5  9:38:46.628741 [service.c:0219] openais component openais_amf
>> loaded.
>> Sep  5  9:38:46.628796 [service.c:0123] Registering service handler
>> 'openais availability management framework B.01.01'
>> Sep  5  9:38:46.628850 [service.c:0219] openais component openais_clm
>> loaded.
>> Sep  5  9:38:46.628903 [service.c:0123] Registering service handler
>> 'openais cluster membership service B.01.01'
>> Sep  5  9:38:46.628956 [service.c:0219] openais component openais_evs
>> loaded.
>> Sep  5  9:38:46.629008 [service.c:0123] Registering service handler
>> 'openais extended virtual synchrony service'
>> Sep  5  9:38:46.651707 [totemsrp.c:0716] Token Timeout (1000 ms)
>> retransmit timeout (238 ms)
>> Sep  5  9:38:46.651862 [totemsrp.c:0719] token hold (180 ms) retransmits
>> before loss (4 retrans)
>> Sep  5  9:38:46.651919 [totemsrp.c:0726] join (100 ms) send_join (0 ms)
>> consensus (200 ms) merge (200 ms)Sep  5  9:38:46.651974
>> [totemsrp.c:0729] downcheck (1000000 ms) fail to recv const (50 msgs)
>> Sep  5  9:38:46.652029 [totemsrp.c:0731] seqno unchanged const (30
>> rotations) Maximum network MTU 1500
>> Sep  5  9:38:46.652085 [totemsrp.c:0735] window size per rotation (50
>> messages) maximum messages per rotation (17 messages)
>> Sep  5  9:38:46.652164 [totemsrp.c:0738] send threads (0 threads)
>> Sep  5  9:38:46.652217 [totemsrp.c:0741] RRP token expired timeout
>> (238 ms)
>> Sep  5  9:38:46.652294 [totemsrp.c:0744] RRP token problem counter
>> (2000 ms)
>> Sep  5  9:38:46.652347 [totemsrp.c:0747] RRP threshold (10 problem count)
>> Sep  5  9:38:46.652399 [totemsrp.c:0749] RRP mode set to none.
>> Sep  5  9:38:46.652461 [totemsrp.c:0752] heartbeat_failures_allowed (0)
>> Sep  5  9:38:46.652514 [totemsrp.c:0754] max_network_delay (50 ms)
>> Sep  5  9:38:46.652836 [totemsrp.c:0775] HeartBeat is Disabled. To
>> enable set heartbeat_failures_allowed > 0
>> Sep  5  9:38:46.654156 [totemnet.c:1034] Receive multicast socket recv
>> buffer size (212992 bytes).
>> Sep  5  9:38:46.654237 [totemnet.c:1040] Transmit multicast socket send
>> buffer size (212992 bytes).
>> Sep  5  9:38:46.654450 [totemnet.c:0848] The network interface
>> [192.168.0.2] is now up.
>> Sep  5  9:38:46.654674 [totemsrp.c:4029] Created or loaded sequence id
>> 0.192.168.0.2 for this ring.
>> Sep  5  9:38:46.655105 [totemsrp.c:1662] entering GATHER state.
>> Sep  5  9:38:46.655408 [service.c:0236] Initialising service handler
>> 'openais extended virtual synchrony service'
>> Sep  5  9:38:46.655479 [service.c:0236] Initialising service handler
>> 'openais cluster membership service B.01.01'
>> Sep  5  9:38:46.658402 [service.c:0236] Initialising service handler
>> 'openais availability management framework B.01.01'
>> Sep  5  9:38:46.658870 [service.c:0236] Initialising service handler
>> 'openais checkpoint service B.01.01'Sep  5  9:38:46.658987
>> [service.c:0236] Initialising service handler 'openais event service
>> B.01.01'
>> Sep  5  9:38:46.659106 [service.c:0236] Initialising service handler
>> 'openais distributed locking service B.01.01'
>> Sep  5  9:38:46.659183 [service.c:0236] Initialising service handler
>> 'openais message service B.01.01'
>> Sep  5  9:38:46.659263 [service.c:0236] Initialising service handler
>> 'openais configuration service'
>> Sep  5  9:38:46.659341 [service.c:0236] Initialising service handler
>> 'openais cluster closed process group service v1.01'
>> Sep  5  9:38:46.659419 [sync.c:0277] Not using a virtual synchrony
>> filter.
>> Sep  5  9:38:46.659621 [main.c:0589] AIS Executive Service: started and
>> ready to provide service.
>> Sep  5  9:38:46.659929 [totemsrp.c:2672] Creating commit token because I
>> am the rep.
>> Sep  5  9:38:46.660095 [totemsrp.c:1240] Saving state aru 0 high seq
>> received 0
>> Sep  5  9:38:46.660282 [totemsrp.c:2826] Storing new sequence id for
>> ring 4
>> Sep  5  9:38:46.660456 [totemsrp.c:1698] entering COMMIT state.
>> Sep  5  9:38:46.660678 [totemsrp.c:1732] entering RECOVERY state.
>> Sep  5  9:38:46.660854 [totemsrp.c:1766] position [0] member 192.168.0.2:
>> Sep  5  9:38:46.660920 [totemsrp.c:1770] previous ring seq 0 rep
>> 192.168.0.2
>> Sep  5  9:38:46.660983 [totemsrp.c:1776] aru 0 high delivered 0 received
>> flag 0
>> Sep  5  9:38:46.661046 [totemsrp.c:1883] Did not need to originate any
>> messages in recovery.
>> Sep  5  9:38:46.661268 [totemsrp.c:3958] Sending initial ORF token
>> Sep  5  9:38:46.662997 [clm.c:0510] CLM CONFIGURATION CHANGE
>> Sep  5  9:38:46.663087 [clm.c:0511] New Configuration:
>> Sep  5  9:38:46.663142 [clm.c:0515] Members Left:
>> Sep  5  9:38:46.663196 [clm.c:0520] Members Joined:
>> Sep  5  9:38:46.663329 [sync.c:0318] This node is within the primary
>> component and will provide service.
>> Sep  5  9:38:46.663489 [clm.c:0510] CLM CONFIGURATION CHANGE
>> Sep  5  9:38:46.663544 [clm.c:0511] New Configuration:
>> Sep  5  9:38:46.663603 [clm.c:0513]     r(0) ip(192.168.0.2)
>> Sep  5  9:38:46.663678 [clm.c:0515] Members Left:
>> Sep  5  9:38:46.663731 [clm.c:0520] Members Joined:
>> Sep  5  9:38:46.663786 [clm.c:0522]     r(0) ip(192.168.0.2)
>> Sep  5  9:38:46.663854 [sync.c:0318] This node is within the primary
>> component and will provide service.
>> Sep  5  9:38:46.663944 [totemsrp.c:1607] entering OPERATIONAL state.
>> Sep  5  9:38:46.673547 [clm.c:0605] got nodejoin message 192.168.0.2
>> Hello world from
>> safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Now run CP
>> Sep  5  9:38:49.678747 [amfcluster.c:0130] Cluster: starting
>> applications.
>> Sep  5  9:38:50.083812 [amfsu.c:0193] Setting SU 'OAM-SU-2' operational
>> state: ENABLED
>> Sep  5  9:38:50.084076 [amfsu.c:0156] Setting SU 'OAM-SU-2' readiness
>> state: IN-SERVICE
>> Sep  5  9:38:50.084179 [amfsu.c:0178] Setting SU 'OAM-SU-2' presence
>> state: INSTANTIATED
>> Saf CP callback 9, saf_callback, 80000000
>> csiSetCallback
>> safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1,
>> safCsi=OAM-1,safSi=OAMWL,safApp=COM-A-1, SA_AMF_HA_ACTIVE
>> Mainstart works like a charm!
>> Sending response (ok) 1Sep  5  9:38:52.741951 [amfsi.c:0231] SU HA state
>> changed to 'ACTIVE' for:
>>                 SI 'OAMWL', SU
>> 'safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1'
>> Sep  5  9:38:52.742071 [amfsi.c:0242] SI Assignment state changed to
>> 'PARTIALLY-ASSIGNED' for:
>>                 SI 'OAMWL', SU
>> 'safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1'
>> Sep  5  9:38:52.742142 [amfcluster.c:0213] Cluster: application COM-A-1
>> assigned.
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Sep  5  9:39:16.949540 [totemsrp.c:1662] entering GATHER state.
>> Sep  5  9:39:17.207483 [totemsrp.c:1240] Saving state aru 13 high seq
>> received 13
>> Sep  5  9:39:17.216110 [totemsrp.c:2826] Storing new sequence id for
>> ring 8
>> Sep  5  9:39:17.216321 [totemsrp.c:1698] entering COMMIT state.
>> Sep  5  9:39:17.216507 [totemsrp.c:1732] entering RECOVERY state.
>> Sep  5  9:39:17.217014 [totemsrp.c:1766] position [0] member 192.168.0.1:
>> Sep  5  9:39:17.217084 [totemsrp.c:1770] previous ring seq 4 rep
>> 192.168.0.1
>> Sep  5  9:39:17.217151 [totemsrp.c:1776] aru 9 high delivered 9 received
>> flag 0
>> Sep  5  9:39:17.217220 [totemsrp.c:1766] position [1] member 192.168.0.2:
>> Sep  5  9:39:17.217286 [totemsrp.c:1770] previous ring seq 4 rep
>> 192.168.0.2
>> Sep  5  9:39:17.217354 [totemsrp.c:1776] aru 13 high delivered 13
>> received flag 0
>> Sep  5  9:39:17.217459 [totemsrp.c:1883] Did not need to originate any
>> messages in recovery.
>> Sep  5  9:39:17.237727 [clm.c:0510] CLM CONFIGURATION CHANGE
>> Sep  5  9:39:17.237899 [clm.c:0511] New Configuration:
>> Sep  5  9:39:17.237974 [clm.c:0513]     r(0) ip(192.168.0.2)
>> Sep  5  9:39:17.238039 [clm.c:0515] Members Left:
>> Sep  5  9:39:17.238103 [clm.c:0520] Members Joined:
>> Sep  5  9:39:17.238294 [sync.c:0318] This node is within the primary
>> component and will provide service.
>> Sep  5  9:39:17.238511 [clm.c:0510] CLM CONFIGURATION CHANGE
>> Sep  5  9:39:17.238577 [clm.c:0511] New Configuration:
>> Sep  5  9:39:17.238645 [clm.c:0513]     r(0) ip(192.168.0.1)
>> Sep  5  9:39:17.238712 [clm.c:0513]     r(0) ip(192.168.0.2)
>> Sep  5  9:39:17.238775 [clm.c:0515] Members Left:
>> Sep  5  9:39:17.238839 [clm.c:0520] Members Joined:
>> Sep  5  9:39:17.238954 [clm.c:0522]     r(0) ip(192.168.0.1)
>> Sep  5  9:39:17.241023 [sync.c:0318] This node is within the primary
>> component and will provide service.
>> Sep  5  9:39:17.241795 [totemsrp.c:1607] entering OPERATIONAL state.
>> Sep  5  9:39:17.264828 [clm.c:0605] got nodejoin message 192.168.0.1
>> Sep  5  9:39:17.266684 [clm.c:0605] got nodejoin message 192.168.0.2
>> Sep  5  9:39:17.312275 [amfnode.c:0284] Node obf-com-1 sync ready,
>> starting hosted SUs.
>> Sep  5  9:39:17.499988 [amfsu.c:0193] Setting SU 'OAM-SU-1' operational
>> state: ENABLED
>> Sep  5  9:39:17.500175 [amfsu.c:0156] Setting SU 'OAM-SU-1' readiness
>> state: IN-SERVICE
>> Sep  5  9:39:17.500248 [amfsu.c:0178] Setting SU 'OAM-SU-1' presence
>> state: INSTANTIATED
>> Sep  5  9:39:17.500329 [amfnode.c:0382] Node: all applications started,
>> assigning workload.
>> Sep  5  9:39:17.749821 [amfsi.c:0231] SU HA state changed to 'STANDBY'
>> for:
>>                 SI 'OAMWL', SU
>> 'safSu=OAM-SU-1,safSg=COM-SG-1,safApp=COM-A-1'
>> Sep  5  9:39:17.750928 [amfsi.c:0242] SI Assignment state changed to
>> 'FULLY-ASSIGNED' for:
>>                 SI 'OAMWL', SU
>> 'safSu=OAM-SU-1,safSg=COM-SG-1,safApp=COM-A-1'
>> Sep  5  9:39:17.751650 [amfnode.c:0397] Node: all workload assigned on
>> node obf-com-1
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>> Saf CP callback 9, saf_callback, 80000000
>> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
>>
>> Regards,
>>
>> // Ola
>>
> 


-- 
 Ola Lundqvist, Civilingenjör Informationsteknologi
 TietoEnator R&D Services AB, Telecom Platforms
 Email:  ola.lundqvist at tietoenator.com
 Phone:  +46 (0)54-29 42 17



More information about the Openais mailing list