[Openais] Re: A bug for AMF

Miyotaka Sakai sakai.miyotaka at nttcom.co.jp
Mon Oct 4 17:42:10 PDT 2004


Steve ,

response inline.

Steven Dake wrote:

> Miyotaka-san
> 
> good work I think we are alot closer...  There are a few things to be
> resolved..
> 
> comments in patch 
> 
> Thanks
> 
> On Mon, 2004-10-04 at 10:21, Miyotaka Sakai wrote:
> 
>>Steve,
>>
>>I made a new patch for AMF .
>>I would like you to review this patch .
>>As you said ,I'll do seperate commit bug and debug mechanism.
>>
>>And could you give me some advice abount the last my email.
>>
>>thanks
>>- Miyotaka Sakai.
>>
>>
>>Miyotaka Sakai wrote:
>>
>>>Steve,
>>>
>>>my responses inline .
>>>
>>>
>>>Steven Dake wrote:
>>>
>>>
>>>>Miyotaka-san
>>>>comments inline
>>>>
>>>>On Sat, 2004-10-02 at 13:27, Miyotaka Sakai wrote:
>>>>
>>>>
>>>>>steve,
>>>>>
>>>>>I've found a bug for AMF and made a patch .
>>>>>I would like you to review this patch .
>>>>>
>>>>>Several gmi_mcast messages couldn't be delivered .
>>>>>I changed gmi_mcast's message priorty from GMI_PRIO_HIGH to
>>>>>GMI_PRIO_RECOVERY ,then messages could be deliverd .
>>>>>
>>>>>If you have another idea, Please tel me.
>>>>>
>>>>>This patch add several mechanisms to ease to debug .
>>>>>( a mechanism is watching AmfComponent contents )
>>>>>
>>>>
>>>>The changing of priorities needs a different solution...  I think I
>>>>messed this up for you when I added the recovery plug.  Could you fix it
>>>>up?
>>>
>>>I am sure. I will.
>>>
>>>
>>>>The way it should work is this:
>>>>messages sent during recovery (within the amf_confchg_fn function and
>>>>all the other messages related to them) should be sent at recovery
>>>>priority.  messages sent during normal operation should retain their
>>>>current priorities.  Since the state machine must now (with the recovery
>>>>plug) send messages at two different priorities depending on what state
>>>>the system is in, we need to keep track of the priorities.  It is
>>>>possible for the state machine to know the priority at which it should
>>>>send messages by placing the priority in the first message that starts
>>>>the state transitions.  Then every message that triggers a new message
>>>>to be sent should encode the priority of the message to the next 
>>>>state. If you need more details please ask.
>>>
>>>OK.
>>>I know what you mean .
>>>For exsample ,haStateSetCluster relating amf_confchg_fn uses 
>>>GMI_PRIO_RECOVERY ,but others use GMI_PRIO_HIGH.
>>>It is possible ,and I 'll fix it .
>>>
>>>
>>>>It is possible that the output queues used in gmi_mcast are full.  In
>>>>this case, it is necessary to use gmi_token_callback_create.  Mark
>>>>Haverkamp has implemented this in the event service if you need an
>>>>example of using it.  This callback will be called the next time it may
>>>>be possible to send messages.  The callback should then resend the state
>>>>machine message that failed to send on an earlier try.
>>>
>>>
>>>I refered to Mark's implementation and ,I understaned this 
>>>implementation. Please take some time to implement this .
>>>
>>>
>>>>Why the signal stuff in main.c?
>>>
>>>
>>>I want to konw AmfComponents cotents at the time when I want to know .
>>>( I don't know when I want to know AmfComponents contens.
>>>  It is different in each debug situation. )
>>>In that case ,Most easy inmplementation is signal .
>>>If you have ,Could you tell me another idea ?
>>>
>>>
>>>>The added debug output looks fine.  Please try to use log_printf instead
>>>>of printf though.  Could you check in just the debug output as a
>>>>seperate commit after addressing the comment above?
>>>
>>>
>>>I'll change printf into log_print.
>>>
>>>Thanks
>>>- Miyotaka Sakai.
>>>
>>>
>>>>Thanks
>>>>-steve
>>>>
>>>>
>>>>
>>>>>If you accept this patch ,I'll check-in .
>>>>>Thanks !
>>>>>- Miyotaka Sakai.
>>>
>>______________________________________________________________________
>>--- openais.2004-10-01.21.30/exec/amf.c	2004-10-02 13:30:03.000000000 +0900
>>+++ openais/exec/amf.c	2004-10-05 01:56:48.000000000 +0900
>>@@ -478,8 +478,9 @@
>> 	return (0);
>> }
>> 
>>-static void component_unregister (
>>-	struct saAmfComponent *component)
>>+static void component_unregisterdtl (
>>+	struct saAmfComponent *component,
>>+
> 
> 
> why unregisterdtl?  I dont understand what "dtl" means...
I'm sorry .
dtl means "detail parameter set".
I'll change this into component_unregisterpriority.
It means "unregister with priority".


> The patch is close but is missing something.  Priority is encoded into
> the state machine transitions as well.  The reason is because the call
> dsmDisabledUnlockedFailed() must call with the correct priority.  This
> requires encoding the priority all along the call chain in component
> register and unregister error report, etc.
> 
> Doing this has the side effect of requiring only one haStateSetCluster
> and readinessStateSetCluster call instead of *tdl.
> 
> more comments below
> 
> 
>>	int priority)
>> {
>> 	struct req_exec_amf_componentunregister req_exec_amf_componentunregister;
>> 	struct iovec iovecs[2];
>>@@ -509,12 +510,19 @@
>> 	iovecs[0].iov_base = (char *)&req_exec_amf_componentunregister;
>> 	iovecs[0].iov_len = sizeof (req_exec_amf_componentunregister);
>> 
>>-	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_MED) == 0);
>>+	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>> }
>> 
>>-static void component_register (
>>+static void component_unregister (
>> 	struct saAmfComponent *component)
>> {
>>+	component_unregisterdtl (component,GMI_PRIO_MED);
>>+}
>>+
>>+static void component_registerdtl (
>>+	struct saAmfComponent *component,
>>+	int priority)
>>+{
>> 	struct req_exec_amf_componentregister req_exec_amf_componentregister;
>> 	struct iovec iovecs[2];
>> 
>>@@ -526,7 +534,6 @@
>> 	}
>> 	log_printf (LOG_LEVEL_DEBUG, "component_register: registering component %s\n",
>> 		getSaNameT (&component->name));
>>-	component->probableCause = SA_AMF_NOT_RESPONDING;
> 
> 
> ^^ this change is very suspect..  I spent alot of time getting this to
> work right when the application doesn't respond to a healthcheck (try
> ctrl-z on the active or standby application to simulate this
> condition).  I guess the reason you had to remove this was because the
> if conditional in dsmDisabledUnlockedFailed (described above) failed to
> transition correctly if it was set.

Please take some time to consider it .
After that I will change this code.
But this program doesn't works with SA_AMF_NOT_RESPONDING.
This time ,please accept this way .

>> 	req_exec_amf_componentregister.header.size = sizeof (struct req_exec_amf_componentregister);
>> 	req_exec_amf_componentregister.header.id = MESSAGE_REQ_EXEC_AMF_COMPONENTREGISTER;
>>@@ -543,7 +550,7 @@
>> 	iovecs[0].iov_base = (char *)&req_exec_amf_componentregister;
>> 	iovecs[0].iov_len = sizeof (req_exec_amf_componentregister);
>> 
>>-	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_RECOVERY) == 0);
>>+	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>> }
>> 
>> /***
>>@@ -747,9 +754,10 @@
>> }
>> #endif
>> 
>>-void haStateSetCluster (
>>+static void haStateSetClusterdtl (
>> 	struct saAmfComponent *component,
>>-	SaAmfHAStateT haState)
>>+	SaAmfHAStateT haState,
>>+	int priority)
>> {
>> 
>> 	struct req_exec_amf_hastateset req_exec_amf_hastateset;
>>@@ -766,7 +774,14 @@
>> 	iovecs[0].iov_base = (char *)&req_exec_amf_hastateset;
>> 	iovecs[0].iov_len = sizeof (req_exec_amf_hastateset);
>> 
>>-	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
>>+	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>>+}
>>+
>>+static void haStateSetCluster (
>>+	struct saAmfComponent *component,
>>+	SaAmfHAStateT haState)
>>+{
>>+	haStateSetClusterdtl (component, haState, GMI_PRIO_HIGH);
>> }
>> 
>> void readinessStateSetApi (struct saAmfComponent *component,
>>@@ -837,9 +852,10 @@
>> }
>> #endif
>> 
>>-void readinessStateSetCluster (
>>+static void readinessStateSetClusterdtl(
>> 	struct saAmfComponent *component,
>>-	SaAmfReadinessStateT readinessState)
>>+	SaAmfReadinessStateT readinessState,
>>+	int priority)
>> {
>> 
>> 	struct req_exec_amf_readinessstateset req_exec_amf_readinessstateset;
>>@@ -857,7 +873,14 @@
>> 	iovecs[0].iov_base = (char *)&req_exec_amf_readinessstateset;
>> 	iovecs[0].iov_len = sizeof (req_exec_amf_readinessstateset);
>> 
>>-	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
>>+	assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>>+}
>>+
>>+static void readinessStateSetCluster(
>>+	struct saAmfComponent *component,
>>+	SaAmfReadinessStateT readinessState)
>>+{
>>+	readinessStateSetClusterdtl (component, readinessState, GMI_PRIO_HIGH);
>> }
>> 
>> #ifdef CMOPILE_OUT
>>@@ -1266,7 +1289,7 @@
>> 			/* noop - operational state */
>> 			break;
>> 		case AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED:
>>-			dsmEnabledUnlockedActiveRequested (component);
>>+			dsmEnabledUnlockedStandbyRequested (component);
>> 			break;
>> 		case AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED:
>> 			/* noop - operational state */
>>@@ -1699,9 +1722,9 @@
>> 	if (component->source_addr.s_addr != source_addr->s_addr) {
>> 		return;
>> 	}
>>-		
>>+
>> 	component->local = 1;
>>-	component_unregister (component);
>>+	component_unregisterdtl (component, GMI_PRIO_RECOVERY);
>> 
>> 	return;
>> }
>>@@ -1713,7 +1736,7 @@
>> 		return;
>> 	}
>> 
>>-	component_register (component);
>>+	component_registerdtl (component, GMI_PRIO_RECOVERY);
>> 	return;
>> }
>> 
>>@@ -1724,8 +1747,8 @@
>> 	}
>> 
>> 	/* dsm change must be needed */
>>-	readinessStateSetCluster (component, component->currentReadinessState);
>>-	haStateSetCluster (component, component->currentHAState);
>>+	readinessStateSetClusterdtl (component, component->currentReadinessState, GMI_PRIO_RECOVERY);
>>+	haStateSetClusterdtl (component, component->currentHAState ,GMI_PRIO_RECOVERY);
>> 
>> 	return;
>> }
>>@@ -2324,7 +2347,6 @@
>> 	return (0);
>> }
>> 
>>-
>> static int message_handler_req_amf_errorreport (struct conn_info *conn_info, void *message)
>> {
>> 	struct req_lib_amf_errorreport *req_lib_amf_errorreport = (struct req_lib_amf_errorreport *)message;
>>@@ -2474,3 +2496,115 @@
>> 	return (0);
>> }
>> 
> 
> 
> Could you make these switches instead of if's?  Or they could be a table
> such as

I see.
> 
> /*
>  * Make sure to sync to state machine variables (!)
>  */
> static char **disabled_unlocked_state_text = {
> 	"AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL(%d)",
> 	"AMF_DISABLED_UNLOCKED_FAILED(%d)",
> 
> 	...
> };
> 
> static char *amf_disabledunlockedstate_ntoa (state) {
> 	static char str_disabledstate[256];
> 
> 	sprintf (str_disabledstate, disabled_unlocked_state_text[state],
> state);
> 	return (str_disabledstate);
> }
> 
> 
>>+static char *amf_disabledunlockedstate_ntoa (state)
>>+{
>>+	static char str_disabledstate[16];
> 
> 
> ^^^ this variable is not big enough to store the data
I'm sorry.
I'll collect appropriate size.

>>+	if (state == AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL) {
>>+		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL(%d)", state);
>>+	} else if (state == AMF_DISABLED_UNLOCKED_FAILED) {
>>+		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_FAILED(%d)", state);
>>+	} else if (state == AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED) {
>>+		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED(%d)", state);
>>+	} else if (state == AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED) {
>>+		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED(%d)", state);
>>+	} else if (state == AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED) {
>>+		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED(%d)", state);
>>+	} else if (state == AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_COMPLETED) {
>>+		sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_COMPLETED(%d)", state);
>>+	} else {
>>+		sprintf (str_disabledstate, "Unkown(%d)", state);
>>+	}
>>+
>>+	return (str_disabledstate);
>>+}
>>+
>>+static char *amf_enabledunlockedstate_ntoa (state)
>>+{
>>+	static char str_enabledunlockedstate[16];
>>+	if (state == AMF_ENABLED_UNLOCKED_INITIAL) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_INITIAL(%d)", state);
>>+	} else if (state == AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED(%d)", state);
>>+	} else if (state == AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED(%d)", state);
>>+	} else if (state == AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED(%d)", state);
>>+	} else if (state == AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED(%d)", state);
>>+	} else if (state == AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED(%d)", state);
>>+	} else if (state == AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED) {
>>+		sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED(%d)", state);
>>+	} else {
>>+		sprintf (str_enabledunlockedstate, "Unkown(%d)", state);
>>+	}
>>+
>>+	return (str_enabledunlockedstate);
>>+}
>>+
>>+static char *amf_readinessstate_ntoa (readinessstate)
>>+{
>>+	static char str_readiness[16];
>>+	if (readinessstate == SA_AMF_OUT_OF_SERVICE) {
>>+		sprintf (str_readiness, "SA_AMF_OUT_OF_SERVICE(%d)", readinessstate);
>>+	} else if (readinessstate == SA_AMF_IN_SERVICE) {
>>+		sprintf (str_readiness, "SA_AMF_IN_SERVICE(%d)", readinessstate);
>>+	} else if (readinessstate == SA_AMF_STOPPING) {
>>+		sprintf (str_readiness, "SA_AMF_STOPPING(%d)", readinessstate);
>>+	} else {
>>+		sprintf (str_readiness, "Unkown (%d)", readinessstate);
>>+	}
>>+
>>+	return (str_readiness);
>>+}
>>+
>>+static char *amf_hastate_ntoa (hastate)
>>+{
>>+	static char str_hastate[16];
>>+	if (hastate == SA_AMF_ACTIVE) {
>>+		sprintf (str_hastate, "SA_AMF_ACTIVE(%d)", hastate);
>>+	} else if (hastate == SA_AMF_STANDBY) {
>>+		sprintf (str_hastate, "SA_AMF_STANDBY(%d)", hastate);
>>+	} else if (hastate == SA_AMF_QUIESCED) {
>>+		sprintf (str_hastate, "SA_AMF_QUIESCED(%d)", hastate);
>>+	} else {
>>+		sprintf (str_hastate, "Unkown (%d)", hastate);
>>+	}
>>+
>>+	return (str_hastate);
>>+}
>>+
> 
> ^^ same comments apply from above
> 
> 
> 
>>+static void amf_dump_comp (struct saAmfComponent *component ,void *data)
>>+{
>>+	char name[64];
>>+	data = NULL;
>>+
>>+	log_printf (LOG_LEVEL_DEBUG, "----------------\n" );
>>+	log_printf (LOG_LEVEL_DEBUG, "registered            = %d\n" ,component->registered);
>>+	log_printf (LOG_LEVEL_DEBUG, "local                 = %d\n" ,component->local );
>>+	log_printf (LOG_LEVEL_DEBUG, "source_addr           = %s\n" ,inet_ntoa (component->source_addr));
>>+	memset (name, 0 , sizeof(name));
>>+	memcpy (name, component->name.value, component->name.length);
>>+	log_printf (LOG_LEVEL_DEBUG, "name                  = %s\n" ,name );
>>+	log_printf (LOG_LEVEL_DEBUG, "currentReadinessState = %s\n" ,amf_readinessstate_ntoa (component->currentReadinessState));
>>+	log_printf (LOG_LEVEL_DEBUG, "newReadinessState     = %s\n" ,amf_readinessstate_ntoa (component->newReadinessState));
>>+	log_printf (LOG_LEVEL_DEBUG, "currentHAState        = %s\n" ,amf_hastate_ntoa (component->currentHAState));
>>+	log_printf (LOG_LEVEL_DEBUG, "newHAState            = %s\n" ,amf_hastate_ntoa (component->newHAState));
>>+	log_printf (LOG_LEVEL_DEBUG, "enabledUnlockedState  = %s\n" ,amf_enabledunlockedstate_ntoa (component->enabledUnlockedState));
>>+	log_printf (LOG_LEVEL_DEBUG,"disabledUnlockedState = %s\n" ,amf_disabledunlockedstate_ntoa (component->disabledUnlockedState));
>>+	log_printf (LOG_LEVEL_DEBUG,"probableCause         = %d\n" ,component->probableCause );
>>+}
>>+
> 
> 
> ^^^ this debug output is really good work thanks for the effort here.
> 
> 
> 
>>+void amf_dump ( )
>>+{
>>+	pid_t pid;
>>+
>>+	pid = fork ();
> 
> 
> why the fork?  I'd rather not have this unless it is absolutely
> necessary..  It could create some unforseen problems or change behavior
> during debugging.

I will delete fork .

> 
> 
>>+	if (pid == 0) {
>>+		enumerate_components (amf_dump_comp, NULL);
>>+		fflush (stdout);
>>+		exit (0);
>>+	}
>>+
>>+	return;
>>+}
>>--- openais.2004-10-01.21.30/exec/amf.h	2004-10-02 13:30:03.000000000 +0900
>>+++ openais/exec/amf.h	2004-10-03 04:00:13.000000000 +0900
>>@@ -52,6 +52,7 @@
>> 	int trackActive;
>> };
>> 
>>+void amf_dump ();
>> extern struct service_handler amf_service_handler;
>> 
>> #endif /* AMF_H_DEFINED */
>>--- openais.2004-10-01.21.30/exec/main.c	2004-10-02 13:30:03.000000000 +0900
>>+++ openais/exec/main.c	2004-10-05 01:47:36.000000000 +0900
>>@@ -143,6 +143,21 @@
>> 	return (conn_info);
>> }
>> 
>>+#ifdef DEBUG
>>+static void aisexec_sigusr2 ( )
>>+{
>>+	amf_dump ();
>>+
>>+	signal (SIGUSR2 ,aisexec_sigusr2);
>>+	return;
>>+}
>>+
>>+static void aisexec_signal ( )
>>+{
>>+	signal (SIGUSR2 ,aisexec_sigusr2);
>>+	return;
>>+}
>>+#endif
>> 
> 
> 
> this signal is confusing to me..  You call aisexec_signal which sets
> signal to aisexec_sigusr2?  What is the purpose of this code?

I take abount this with you previous email .
I should implement aisexec_debugdump_fn and give me some time.


> the prototype is not right for a signal handler it should be (example
> from main.c) :
> 
> void sigintr_handler (int signum)

You are right.
I'll correct .

> 
> 
> 
>> struct sockaddr_in this_ip;
>> #define LOCALHOST_IP inet_addr("127.0.0.1")
>>@@ -873,6 +888,10 @@
>> 
>> 	aisexec_poll_handle = poll_create ();
>> 
>>+#ifdef DEBUG
>>+	aisexec_signal ( );
>>+#endif
>>+
> 
> just call:
> 	signal (SIGUSR2, sigusr2_handler);

OK .
> 
> 
> 
>> 	/*
>> 	 * if gmi_init doesn't have root priveleges, it cannot
>> 	 * bind to a specific interface.  This only matters if
> 
> 






More information about the Openais mailing list