[Openais] When reassigning workload, the "old" component is not set to standby

Hans Feldt Hans.Feldt at ericsson.com
Tue Sep 5 06:12:12 PDT 2006


Looks like it works "like a charm". First node gets the active 
assignment. Second node gets the standby assignment. There is an AMF 
feature called 'auto-adjust' that I think would do what you expect 
(switch to preferred). That feature is __not__ implemented in AMF.

Try the command (well hidden...):

$ pkill -USR2 aisexec

That will give you the current AMF state.

Regards,
Hans


Ola Lundqvist wrote:
> Hi
> 
> The following was done:
> * Start ais on obf-com-2
>   -> csi assigned to the component on that node
>   "Mainstart works like a charm!" below in logs.
> * Start ais on obf-com-1
>   -> csi assigned to component on node 1.
> 
> But I'm expecting the csi to be removed from node 2 when this happens.
> Shouldn't it?
> 
> obf-com-2:~# /opt/ais/sbin/aisexec -f
> Sep  5  9:38:46.627394 [main.c:0409] AIS Executive Service RELEASE 'trunk'
> Sep  5  9:38:46.627987 [main.c:0410] Copyright (C) 2002-2006 MontaVista
> Software, Inc and contributors.
> Sep  5  9:38:46.628044 [main.c:0411] Copyright (C) 2006 Red Hat, Inc.
> Sep  5  9:38:46.628098 [service.c:0219] openais component openais_cpg
> loaded.
> Sep  5  9:38:46.628152 [service.c:0123] Registering service handler
> 'openais cluster closed process group service v1.01'
> Sep  5  9:38:46.628205 [service.c:0219] openais component openais_cfg
> loaded.
> Sep  5  9:38:46.628259 [service.c:0123] Registering service handler
> 'openais configuration service'
> Sep  5  9:38:46.628312 [service.c:0219] openais component openais_msg
> loaded.
> Sep  5  9:38:46.628366 [service.c:0123] Registering service handler
> 'openais message service B.01.01'
> Sep  5  9:38:46.628419 [service.c:0219] openais component openais_lck
> loaded.
> Sep  5  9:38:46.628475 [service.c:0123] Registering service handler
> 'openais distributed locking service B.01.01'
> Sep  5  9:38:46.628528 [service.c:0219] openais component openais_evt
> loaded.
> Sep  5  9:38:46.628581 [service.c:0123] Registering service handler
> 'openais event service B.01.01'
> Sep  5  9:38:46.628634 [service.c:0219] openais component openais_ckpt
> loaded.
> Sep  5  9:38:46.628687 [service.c:0123] Registering service handler
> 'openais checkpoint service B.01.01'
> Sep  5  9:38:46.628741 [service.c:0219] openais component openais_amf
> loaded.
> Sep  5  9:38:46.628796 [service.c:0123] Registering service handler
> 'openais availability management framework B.01.01'
> Sep  5  9:38:46.628850 [service.c:0219] openais component openais_clm
> loaded.
> Sep  5  9:38:46.628903 [service.c:0123] Registering service handler
> 'openais cluster membership service B.01.01'
> Sep  5  9:38:46.628956 [service.c:0219] openais component openais_evs
> loaded.
> Sep  5  9:38:46.629008 [service.c:0123] Registering service handler
> 'openais extended virtual synchrony service'
> Sep  5  9:38:46.651707 [totemsrp.c:0716] Token Timeout (1000 ms)
> retransmit timeout (238 ms)
> Sep  5  9:38:46.651862 [totemsrp.c:0719] token hold (180 ms) retransmits
> before loss (4 retrans)
> Sep  5  9:38:46.651919 [totemsrp.c:0726] join (100 ms) send_join (0 ms)
> consensus (200 ms) merge (200 ms)Sep  5  9:38:46.651974
> [totemsrp.c:0729] downcheck (1000000 ms) fail to recv const (50 msgs)
> Sep  5  9:38:46.652029 [totemsrp.c:0731] seqno unchanged const (30
> rotations) Maximum network MTU 1500
> Sep  5  9:38:46.652085 [totemsrp.c:0735] window size per rotation (50
> messages) maximum messages per rotation (17 messages)
> Sep  5  9:38:46.652164 [totemsrp.c:0738] send threads (0 threads)
> Sep  5  9:38:46.652217 [totemsrp.c:0741] RRP token expired timeout (238 ms)
> Sep  5  9:38:46.652294 [totemsrp.c:0744] RRP token problem counter (2000 ms)
> Sep  5  9:38:46.652347 [totemsrp.c:0747] RRP threshold (10 problem count)
> Sep  5  9:38:46.652399 [totemsrp.c:0749] RRP mode set to none.
> Sep  5  9:38:46.652461 [totemsrp.c:0752] heartbeat_failures_allowed (0)
> Sep  5  9:38:46.652514 [totemsrp.c:0754] max_network_delay (50 ms)
> Sep  5  9:38:46.652836 [totemsrp.c:0775] HeartBeat is Disabled. To
> enable set heartbeat_failures_allowed > 0
> Sep  5  9:38:46.654156 [totemnet.c:1034] Receive multicast socket recv
> buffer size (212992 bytes).
> Sep  5  9:38:46.654237 [totemnet.c:1040] Transmit multicast socket send
> buffer size (212992 bytes).
> Sep  5  9:38:46.654450 [totemnet.c:0848] The network interface
> [192.168.0.2] is now up.
> Sep  5  9:38:46.654674 [totemsrp.c:4029] Created or loaded sequence id
> 0.192.168.0.2 for this ring.
> Sep  5  9:38:46.655105 [totemsrp.c:1662] entering GATHER state.
> Sep  5  9:38:46.655408 [service.c:0236] Initialising service handler
> 'openais extended virtual synchrony service'
> Sep  5  9:38:46.655479 [service.c:0236] Initialising service handler
> 'openais cluster membership service B.01.01'
> Sep  5  9:38:46.658402 [service.c:0236] Initialising service handler
> 'openais availability management framework B.01.01'
> Sep  5  9:38:46.658870 [service.c:0236] Initialising service handler
> 'openais checkpoint service B.01.01'Sep  5  9:38:46.658987
> [service.c:0236] Initialising service handler 'openais event service
> B.01.01'
> Sep  5  9:38:46.659106 [service.c:0236] Initialising service handler
> 'openais distributed locking service B.01.01'
> Sep  5  9:38:46.659183 [service.c:0236] Initialising service handler
> 'openais message service B.01.01'
> Sep  5  9:38:46.659263 [service.c:0236] Initialising service handler
> 'openais configuration service'
> Sep  5  9:38:46.659341 [service.c:0236] Initialising service handler
> 'openais cluster closed process group service v1.01'
> Sep  5  9:38:46.659419 [sync.c:0277] Not using a virtual synchrony filter.
> Sep  5  9:38:46.659621 [main.c:0589] AIS Executive Service: started and
> ready to provide service.
> Sep  5  9:38:46.659929 [totemsrp.c:2672] Creating commit token because I
> am the rep.
> Sep  5  9:38:46.660095 [totemsrp.c:1240] Saving state aru 0 high seq
> received 0
> Sep  5  9:38:46.660282 [totemsrp.c:2826] Storing new sequence id for ring 4
> Sep  5  9:38:46.660456 [totemsrp.c:1698] entering COMMIT state.
> Sep  5  9:38:46.660678 [totemsrp.c:1732] entering RECOVERY state.
> Sep  5  9:38:46.660854 [totemsrp.c:1766] position [0] member 192.168.0.2:
> Sep  5  9:38:46.660920 [totemsrp.c:1770] previous ring seq 0 rep 192.168.0.2
> Sep  5  9:38:46.660983 [totemsrp.c:1776] aru 0 high delivered 0 received
> flag 0
> Sep  5  9:38:46.661046 [totemsrp.c:1883] Did not need to originate any
> messages in recovery.
> Sep  5  9:38:46.661268 [totemsrp.c:3958] Sending initial ORF token
> Sep  5  9:38:46.662997 [clm.c:0510] CLM CONFIGURATION CHANGE
> Sep  5  9:38:46.663087 [clm.c:0511] New Configuration:
> Sep  5  9:38:46.663142 [clm.c:0515] Members Left:
> Sep  5  9:38:46.663196 [clm.c:0520] Members Joined:
> Sep  5  9:38:46.663329 [sync.c:0318] This node is within the primary
> component and will provide service.
> Sep  5  9:38:46.663489 [clm.c:0510] CLM CONFIGURATION CHANGE
> Sep  5  9:38:46.663544 [clm.c:0511] New Configuration:
> Sep  5  9:38:46.663603 [clm.c:0513]     r(0) ip(192.168.0.2)
> Sep  5  9:38:46.663678 [clm.c:0515] Members Left:
> Sep  5  9:38:46.663731 [clm.c:0520] Members Joined:
> Sep  5  9:38:46.663786 [clm.c:0522]     r(0) ip(192.168.0.2)
> Sep  5  9:38:46.663854 [sync.c:0318] This node is within the primary
> component and will provide service.
> Sep  5  9:38:46.663944 [totemsrp.c:1607] entering OPERATIONAL state.
> Sep  5  9:38:46.673547 [clm.c:0605] got nodejoin message 192.168.0.2
> Hello world from
> safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Now run CP
> Sep  5  9:38:49.678747 [amfcluster.c:0130] Cluster: starting applications.
> Sep  5  9:38:50.083812 [amfsu.c:0193] Setting SU 'OAM-SU-2' operational
> state: ENABLED
> Sep  5  9:38:50.084076 [amfsu.c:0156] Setting SU 'OAM-SU-2' readiness
> state: IN-SERVICE
> Sep  5  9:38:50.084179 [amfsu.c:0178] Setting SU 'OAM-SU-2' presence
> state: INSTANTIATED
> Saf CP callback 9, saf_callback, 80000000
> csiSetCallback
> safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1,
> safCsi=OAM-1,safSi=OAMWL,safApp=COM-A-1, SA_AMF_HA_ACTIVE
> Mainstart works like a charm!
> Sending response (ok) 1Sep  5  9:38:52.741951 [amfsi.c:0231] SU HA state
> changed to 'ACTIVE' for:
>                 SI 'OAMWL', SU
> 'safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1'
> Sep  5  9:38:52.742071 [amfsi.c:0242] SI Assignment state changed to
> 'PARTIALLY-ASSIGNED' for:
>                 SI 'OAMWL', SU
> 'safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1'
> Sep  5  9:38:52.742142 [amfcluster.c:0213] Cluster: application COM-A-1
> assigned.
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Sep  5  9:39:16.949540 [totemsrp.c:1662] entering GATHER state.
> Sep  5  9:39:17.207483 [totemsrp.c:1240] Saving state aru 13 high seq
> received 13
> Sep  5  9:39:17.216110 [totemsrp.c:2826] Storing new sequence id for ring 8
> Sep  5  9:39:17.216321 [totemsrp.c:1698] entering COMMIT state.
> Sep  5  9:39:17.216507 [totemsrp.c:1732] entering RECOVERY state.
> Sep  5  9:39:17.217014 [totemsrp.c:1766] position [0] member 192.168.0.1:
> Sep  5  9:39:17.217084 [totemsrp.c:1770] previous ring seq 4 rep 192.168.0.1
> Sep  5  9:39:17.217151 [totemsrp.c:1776] aru 9 high delivered 9 received
> flag 0
> Sep  5  9:39:17.217220 [totemsrp.c:1766] position [1] member 192.168.0.2:
> Sep  5  9:39:17.217286 [totemsrp.c:1770] previous ring seq 4 rep 192.168.0.2
> Sep  5  9:39:17.217354 [totemsrp.c:1776] aru 13 high delivered 13
> received flag 0
> Sep  5  9:39:17.217459 [totemsrp.c:1883] Did not need to originate any
> messages in recovery.
> Sep  5  9:39:17.237727 [clm.c:0510] CLM CONFIGURATION CHANGE
> Sep  5  9:39:17.237899 [clm.c:0511] New Configuration:
> Sep  5  9:39:17.237974 [clm.c:0513]     r(0) ip(192.168.0.2)
> Sep  5  9:39:17.238039 [clm.c:0515] Members Left:
> Sep  5  9:39:17.238103 [clm.c:0520] Members Joined:
> Sep  5  9:39:17.238294 [sync.c:0318] This node is within the primary
> component and will provide service.
> Sep  5  9:39:17.238511 [clm.c:0510] CLM CONFIGURATION CHANGE
> Sep  5  9:39:17.238577 [clm.c:0511] New Configuration:
> Sep  5  9:39:17.238645 [clm.c:0513]     r(0) ip(192.168.0.1)
> Sep  5  9:39:17.238712 [clm.c:0513]     r(0) ip(192.168.0.2)
> Sep  5  9:39:17.238775 [clm.c:0515] Members Left:
> Sep  5  9:39:17.238839 [clm.c:0520] Members Joined:
> Sep  5  9:39:17.238954 [clm.c:0522]     r(0) ip(192.168.0.1)
> Sep  5  9:39:17.241023 [sync.c:0318] This node is within the primary
> component and will provide service.
> Sep  5  9:39:17.241795 [totemsrp.c:1607] entering OPERATIONAL state.
> Sep  5  9:39:17.264828 [clm.c:0605] got nodejoin message 192.168.0.1
> Sep  5  9:39:17.266684 [clm.c:0605] got nodejoin message 192.168.0.2
> Sep  5  9:39:17.312275 [amfnode.c:0284] Node obf-com-1 sync ready,
> starting hosted SUs.
> Sep  5  9:39:17.499988 [amfsu.c:0193] Setting SU 'OAM-SU-1' operational
> state: ENABLED
> Sep  5  9:39:17.500175 [amfsu.c:0156] Setting SU 'OAM-SU-1' readiness
> state: IN-SERVICE
> Sep  5  9:39:17.500248 [amfsu.c:0178] Setting SU 'OAM-SU-1' presence
> state: INSTANTIATED
> Sep  5  9:39:17.500329 [amfnode.c:0382] Node: all applications started,
> assigning workload.
> Sep  5  9:39:17.749821 [amfsi.c:0231] SU HA state changed to 'STANDBY' for:
>                 SI 'OAMWL', SU
> 'safSu=OAM-SU-1,safSg=COM-SG-1,safApp=COM-A-1'
> Sep  5  9:39:17.750928 [amfsi.c:0242] SI Assignment state changed to
> 'FULLY-ASSIGNED' for:
>                 SI 'OAMWL', SU
> 'safSu=OAM-SU-1,safSg=COM-SG-1,safApp=COM-A-1'
> Sep  5  9:39:17.751650 [amfnode.c:0397] Node: all workload assigned on
> node obf-com-1
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> Saf CP callback 9, saf_callback, 80000000
> Healthcheck safComp=OAM-C-1,safSu=OAM-SU-2,safSg=COM-SG-1,safApp=COM-A-1
> 
> Regards,
> 
> // Ola
> 




More information about the Openais mailing list