[Openais] Crashing because all SU not yet operational

Steven Dake sdake at redhat.com
Mon Sep 4 23:05:14 PDT 2006


I suspect the warnings are just improper warnings generated by GCC code
reordering algorithms.  Ie active_sus_needed may be only calculated
under a certain circumstance (if it is used later) but the optimizer may
then think it is never calculated and used uninitialized.

The dragon compiler book says this shouldn't be done by a compiler,
however, I have seen it in alot of weird places with -O3.  A simple
workaround if the code is known to work properly is to set it to zero in
the allocation of the local variable on the stack.

The assertion test isn't really valid in sq.h.  Instead sq_position
should range between 0 and sq->size.  Patch attached to fix.

The sq->pos_max code may look a little fishy but sq->pos_max was
originally added to debug errors in the totem srp protocol regarding
when to add a message to a particular sort queue. (it is used by
sq_assert to validate the sort queue).  We have not had problems here
for awhile and sq_assert is unused.

Patch attached to fix the imrpoper assertion.

Regards
-steve

On Mon, 2006-09-04 at 13:53 +0200, Hans Feldt wrote:
> I don't understand why GCC warns about different things when debug 
> compiled or not:
> 
> seasc0036:exec > make OPENAIS_BUILD=DEBUG
> /uabhafe/gcc/4.1/usr/local/bin/gcc  -O0 -g -Wall -DDEBUG -DOPENAIS_LINUX 
> -I../include   -c -o amfsg.o amfsg.c
> seasc0036:exec > touch amfsg.c
> seasc0036:exec > make
> /uabhafe/gcc/4.1/usr/local/bin/gcc  -O3 -Wall -fomit-frame-pointer 
> -DOPENAIS_LINUX -I../include   -c -o amfsg.o amfsg.c
> amfsg.c: In function 'assign_si':
> amfsg.c:1165: warning: 'active_sus_needed' may be used uninitialized in 
> this function
> amfsg.c:1166: warning: 'standby_sus_needed' may be used uninitialized in 
> this function
> 
> 
> The '-W' flag enables some interesting warnings e.g.:
> 
> ../include/sq.h:167: warning: comparison of unsigned expression >= 0 is 
> always true
> 
> 
> /Hans
> 
> Ola Lundqvist wrote:
> > Hi
> > 
> > Thanks a lot for that information, now it works a lot better.
> > 
> > I got the following warnings when compiling however. I think that should
> > be fixed.
> > 
> > amfsg.c:1165: warning: `active_sus_needed' might be used uninitialized
> > in this function
> > amfsg.c:1166: warning: `standby_sus_needed' might be used uninitialized
> > in this function
> > 
> > Thanks a lot
> > 
> > // Ola
> > 
> > Hans Feldt wrote:
> > 
> >>This problem is related to the patch sent by Lars, "Amf node leave and
> >>join #2" which will probably solve your problem. Please test with that
> >>patch or wait until we have committed it.
> >>
> >>Another thing: please include AMF (if AMF related) in the subject line
> >>of your emails to the list, easier for people to find and filter...
> >>
> >>Regards,
> >>Hans
> >>
> >>Ola Lundqvist wrote:
> >>
> >>>Hi
> >>>
> >>>The next crash I got is when all components have been initiated and when
> >>>it try to start up things.
> >>>
> >>>Here is a log of what happens:
> >>>Sep  1 14:45:50.693455 [sync.c:0318] This node is within the primary
> >>>component and will provide service.
> >>>Sep  1 14:45:50.693580 [clm.c:0510] CLM CONFIGURATION CHANGE
> >>>Sep  1 14:45:50.693634 [clm.c:0511] New Configuration:
> >>>Sep  1 14:45:50.693693 [clm.c:0513]     r(0) ip(192.168.0.1)
> >>>Sep  1 14:45:50.693769 [clm.c:0515] Members Left:
> >>>Sep  1 14:45:50.693823 [clm.c:0520] Members Joined:
> >>>Sep  1 14:45:50.693879 [clm.c:0522]     r(0) ip(192.168.0.1)
> >>>Sep  1 14:45:50.693940 [sync.c:0318] This node is within the primary
> >>>component and will provide service.
> >>>Sep  1 14:45:50.694026 [totemsrp.c:1607] entering OPERATIONAL state.
> >>>Sep  1 14:45:50.701111 [clm.c:0605] got nodejoin message 192.168.0.1
> >>>Hello world from
> >>>safComp=OAM-C-1,safSu=OAM-SU-1,safSg=SS7-SG-1,safApp=SS7-A-1
> >>>
> >>>
> >>>>>WARNING<< Timestamp: 1157121953:807037
> >>>
> >>>ProcessType:SequenceNumber  161:1
> >>>    CP:0  ss7osdpn.c   2690     0  1295     0     0     0    1102
> >>>
> >>>
> >>>Sep  1 14:45:53.727109 [amfcluster.c:0130] Cluster: starting
> >>>applications.
> >>>(inservice=0) (active_sus_needed=1) (standby_sus_needed=1)
> >>>assignment VI - partial assignment with SIs drop outs
> >>>(inservice=0) (assigning active=1) (assigning standby=0) (assigning
> >>>spares=0)
> >>>su_active_assign=1, si_total=1,ass_to_su=1
> >>>while su...1 != 2, 0 == 1, 0 > 0
> >>>Not in service.
> >>>while su...1 != 2, 0 == 1, 0 > 0
> >>>Not in service.
> >>>No one to assign. No SU in service yet.
> >>>
> >>>
> >>>The last lines are my local changes to make sure that it do not crash.
> >>>However what I determined is that nothing will be started anyway as the
> >>>service units are not in service...
> >>>
> >>>This is a scetch patch from the local changes that I have done. I have
> >>>removed manually from the patch output all the print statements that I
> >>>had, so it may not apply cleanly.
> >>>
> >>>===================================================================
> >>>--- exec/amfsg.c        (revision 1235)
> >>>+++ exec/amfsg.c        (working copy)
> >>>@@ -964,6 +971,7 @@
> >>>                        amf_su_get_saAmfSUNumCurrStandbySIs (su) > 0) {
> >>>
> >>>                        su = su->next;
> >>>+                        printf("Not in service.\n");
> >>>                        continue; /* Not in service */
> >>>                }
> >>>
> >>>@@ -1118,15 +1127,18 @@
> >>>         */
> >>>        inservice_count = (float)su_inservice_count_get (sg);
> >>>
> >>>-       active_sus_needed = div_round (
> >>>-               sg_si_count_get (sg) * sg->saAmfSGNumPrefActiveSUs,
> >>>-               sg->saAmfSGMaxActiveSIsperSUs);
> >>>+        active_sus_needed = div_round (
> >>>+            sg_si_count_get (sg) * sg->saAmfSGNumPrefActiveSUs,
> >>>+            sg->saAmfSGMaxActiveSIsperSUs);
> >>>
> >>>-       standby_sus_needed = div_round (
> >>>+        standby_sus_needed = 0;
> >>>+        if (sg->saAmfSGMaxStandbySIsperSUs > 0) {
> >>>+            standby_sus_needed = div_round (
> >>>                sg_si_count_get (sg) * sg->saAmfSGNumPrefStandbySUs,
> >>>                sg->saAmfSGMaxStandbySIsperSUs);
> >>>+        }
> >>>
> >>>and then around
> >>>
> >>>@@ -1166,29 +1178,30 @@
> >>>-       assert (assigned > 0);
> >>>+       /*assert (assigned > 0);*/
> >>>+        printf("No one to assign. No SU in service yet.\n");
> >>>
> >>>
> >>>
> >>>So what I want to know, is why the SU may not be considered operational
> >>>and if I have done something wrong with my AMF application.
> >>>
> >>>Regards,
> >>>
> >>>// Ola
> >>>
> >>
> > 
> > 
> 
> _______________________________________________
> Openais mailing list
> Openais at lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/openais
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sq_assert.patch
Type: text/x-patch
Size: 377 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20060904/e358d2bf/sq_assert-0001.bin


More information about the Openais mailing list