[Openais] Crashing because all SU not yet operational

Ola Lundqvist ola.lundqvist at tietoenator.com
Mon Sep 4 23:20:09 PDT 2006


Hi

Steven Dake wrote:
> I suspect the warnings are just improper warnings generated by GCC code
> reordering algorithms.  Ie active_sus_needed may be only calculated
> under a certain circumstance (if it is used later) but the optimizer may
> then think it is never calculated and used uninitialized.

Maybe.

However in this case you can very well have them uninitialized in this
function.

	int active_sus_needed;
	int standby_sus_needed;

...CUT...

	if (sg->saAmfSGNumPrefActiveSUs > 0) {
		active_sus_needed = div_round (
			sg_si_count_get (sg),
			sg->saAmfSGMaxActiveSIsperSUs);
	} else {
		log_printf (LOG_LEVEL_ERROR, "ERROR: saAmfSGNumPrefActiveSUs == 0 !!");
		openais_exit_error (AIS_DONE_FATAL_ERR);
	}

	if (sg->saAmfSGNumPrefStandbySUs > 0) {
		standby_sus_needed = div_round (
			sg_si_count_get (sg),
			sg->saAmfSGMaxStandbySIsperSUs);
	} else {
		log_printf (LOG_LEVEL_ERROR, "ERROR: saAmfSGNumPrefStandbySUs == 0 !!");
		openais_exit_error (AIS_DONE_FATAL_ERR);

	}

	dprintf ("(inservice=%d) (active_sus_needed=%d) (standby_sus_needed=%d)"
		"\n",
		inservice_count, active_sus_needed, standby_sus_needed);

So whenever sg->saAmfSGNumPrefStandbySUs <= 0 (and similar for the
other), then you have this situation.

I actually think that this warning is surpressed for some reason, or
that different optimizing (or other option) actually remove all warnings
for us.

> The dragon compiler book says this shouldn't be done by a compiler,
> however, I have seen it in alot of weird places with -O3.  A simple
> workaround if the code is known to work properly is to set it to zero in
> the allocation of the local variable on the stack.
> 
> The assertion test isn't really valid in sq.h.  Instead sq_position
> should range between 0 and sq->size.  Patch attached to fix.
> 
> The sq->pos_max code may look a little fishy but sq->pos_max was
> originally added to debug errors in the totem srp protocol regarding
> when to add a message to a particular sort queue. (it is used by
> sq_assert to validate the sort queue).  We have not had problems here
> for awhile and sq_assert is unused.
> 
> Patch attached to fix the imrpoper assertion.

Regards,

// Ola

> Regards
> -steve
> 
> On Mon, 2006-09-04 at 13:53 +0200, Hans Feldt wrote:
>>I don't understand why GCC warns about different things when debug 
>>compiled or not:
>>
>>seasc0036:exec > make OPENAIS_BUILD=DEBUG
>>/uabhafe/gcc/4.1/usr/local/bin/gcc  -O0 -g -Wall -DDEBUG -DOPENAIS_LINUX 
>>-I../include   -c -o amfsg.o amfsg.c
>>seasc0036:exec > touch amfsg.c
>>seasc0036:exec > make
>>/uabhafe/gcc/4.1/usr/local/bin/gcc  -O3 -Wall -fomit-frame-pointer 
>>-DOPENAIS_LINUX -I../include   -c -o amfsg.o amfsg.c
>>amfsg.c: In function 'assign_si':
>>amfsg.c:1165: warning: 'active_sus_needed' may be used uninitialized in 
>>this function
>>amfsg.c:1166: warning: 'standby_sus_needed' may be used uninitialized in 
>>this function
>>
>>
>>The '-W' flag enables some interesting warnings e.g.:
>>
>>../include/sq.h:167: warning: comparison of unsigned expression >= 0 is 
>>always true
>>
>>
>>/Hans
>>
>>Ola Lundqvist wrote:
>>>Hi
>>>
>>>Thanks a lot for that information, now it works a lot better.
>>>
>>>I got the following warnings when compiling however. I think that should
>>>be fixed.
>>>
>>>amfsg.c:1165: warning: `active_sus_needed' might be used uninitialized
>>>in this function
>>>amfsg.c:1166: warning: `standby_sus_needed' might be used uninitialized
>>>in this function
>>>
>>>Thanks a lot
>>>
>>>// Ola
>>>
>>>Hans Feldt wrote:
>>>
>>>>This problem is related to the patch sent by Lars, "Amf node leave and
>>>>join #2" which will probably solve your problem. Please test with that
>>>>patch or wait until we have committed it.
>>>>
>>>>Another thing: please include AMF (if AMF related) in the subject line
>>>>of your emails to the list, easier for people to find and filter...
>>>>
>>>>Regards,
>>>>Hans
>>>>
>>>>Ola Lundqvist wrote:
>>>>
>>>>>Hi
>>>>>
>>>>>The next crash I got is when all components have been initiated and when
>>>>>it try to start up things.
>>>>>
>>>>>Here is a log of what happens:
>>>>>Sep  1 14:45:50.693455 [sync.c:0318] This node is within the primary
>>>>>component and will provide service.
>>>>>Sep  1 14:45:50.693580 [clm.c:0510] CLM CONFIGURATION CHANGE
>>>>>Sep  1 14:45:50.693634 [clm.c:0511] New Configuration:
>>>>>Sep  1 14:45:50.693693 [clm.c:0513]     r(0) ip(192.168.0.1)
>>>>>Sep  1 14:45:50.693769 [clm.c:0515] Members Left:
>>>>>Sep  1 14:45:50.693823 [clm.c:0520] Members Joined:
>>>>>Sep  1 14:45:50.693879 [clm.c:0522]     r(0) ip(192.168.0.1)
>>>>>Sep  1 14:45:50.693940 [sync.c:0318] This node is within the primary
>>>>>component and will provide service.
>>>>>Sep  1 14:45:50.694026 [totemsrp.c:1607] entering OPERATIONAL state.
>>>>>Sep  1 14:45:50.701111 [clm.c:0605] got nodejoin message 192.168.0.1
>>>>>Hello world from
>>>>>safComp=OAM-C-1,safSu=OAM-SU-1,safSg=SS7-SG-1,safApp=SS7-A-1
>>>>>
>>>>>
>>>>>>>WARNING<< Timestamp: 1157121953:807037
>>>>>ProcessType:SequenceNumber  161:1
>>>>>   CP:0  ss7osdpn.c   2690     0  1295     0     0     0    1102
>>>>>
>>>>>
>>>>>Sep  1 14:45:53.727109 [amfcluster.c:0130] Cluster: starting
>>>>>applications.
>>>>>(inservice=0) (active_sus_needed=1) (standby_sus_needed=1)
>>>>>assignment VI - partial assignment with SIs drop outs
>>>>>(inservice=0) (assigning active=1) (assigning standby=0) (assigning
>>>>>spares=0)
>>>>>su_active_assign=1, si_total=1,ass_to_su=1
>>>>>while su...1 != 2, 0 == 1, 0 > 0
>>>>>Not in service.
>>>>>while su...1 != 2, 0 == 1, 0 > 0
>>>>>Not in service.
>>>>>No one to assign. No SU in service yet.
>>>>>
>>>>>
>>>>>The last lines are my local changes to make sure that it do not crash.
>>>>>However what I determined is that nothing will be started anyway as the
>>>>>service units are not in service...
>>>>>
>>>>>This is a scetch patch from the local changes that I have done. I have
>>>>>removed manually from the patch output all the print statements that I
>>>>>had, so it may not apply cleanly.
>>>>>
>>>>>===================================================================
>>>>>--- exec/amfsg.c        (revision 1235)
>>>>>+++ exec/amfsg.c        (working copy)
>>>>>@@ -964,6 +971,7 @@
>>>>>                       amf_su_get_saAmfSUNumCurrStandbySIs (su) > 0) {
>>>>>
>>>>>                       su = su->next;
>>>>>+                        printf("Not in service.\n");
>>>>>                       continue; /* Not in service */
>>>>>               }
>>>>>
>>>>>@@ -1118,15 +1127,18 @@
>>>>>        */
>>>>>       inservice_count = (float)su_inservice_count_get (sg);
>>>>>
>>>>>-       active_sus_needed = div_round (
>>>>>-               sg_si_count_get (sg) * sg->saAmfSGNumPrefActiveSUs,
>>>>>-               sg->saAmfSGMaxActiveSIsperSUs);
>>>>>+        active_sus_needed = div_round (
>>>>>+            sg_si_count_get (sg) * sg->saAmfSGNumPrefActiveSUs,
>>>>>+            sg->saAmfSGMaxActiveSIsperSUs);
>>>>>
>>>>>-       standby_sus_needed = div_round (
>>>>>+        standby_sus_needed = 0;
>>>>>+        if (sg->saAmfSGMaxStandbySIsperSUs > 0) {
>>>>>+            standby_sus_needed = div_round (
>>>>>               sg_si_count_get (sg) * sg->saAmfSGNumPrefStandbySUs,
>>>>>               sg->saAmfSGMaxStandbySIsperSUs);
>>>>>+        }
>>>>>
>>>>>and then around
>>>>>
>>>>>@@ -1166,29 +1178,30 @@
>>>>>-       assert (assigned > 0);
>>>>>+       /*assert (assigned > 0);*/
>>>>>+        printf("No one to assign. No SU in service yet.\n");
>>>>>
>>>>>
>>>>>
>>>>>So what I want to know, is why the SU may not be considered operational
>>>>>and if I have done something wrong with my AMF application.
>>>>>
>>>>>Regards,
>>>>>
>>>>>// Ola
>>>>>
>>>
>>_______________________________________________
>>Openais mailing list
>>Openais at lists.osdl.org
>>https://lists.osdl.org/mailman/listinfo/openais
>>
>>------------------------------------------------------------------------
>>
>>Index: include/sq.h
>>===================================================================
>>--- include/sq.h	(revision 1235)
>>+++ include/sq.h	(working copy)
>>@@ -164,7 +164,7 @@
>> 	if (sq_position > sq->pos_max) {
>> 		sq->pos_max = sq_position;
>> 	}
>>-	assert (sq_position >= 0);
>>+	assert (sq_position < sq->size);
>> 
>> 	sq_item = sq->items;
>> 	sq_item += sq_position * sq->size_per_item;


-- 
 Ola Lundqvist, Civilingenjör Informationsteknologi
 TietoEnator R&D Services AB, Telecom Platforms
 Email:  ola.lundqvist at tietoenator.com
 Phone:  +46 (0)54-29 42 17



More information about the Openais mailing list