[Openais] Re: A bug for AMF
Miyotaka Sakai
sakai.miyotaka at nttcom.co.jp
Mon Oct 4 17:42:10 PDT 2004
Steve ,
response inline.
Steven Dake wrote:
> Miyotaka-san
>
> good work I think we are alot closer... There are a few things to be
> resolved..
>
> comments in patch
>
> Thanks
>
> On Mon, 2004-10-04 at 10:21, Miyotaka Sakai wrote:
>
>>Steve,
>>
>>I made a new patch for AMF .
>>I would like you to review this patch .
>>As you said ,I'll do seperate commit bug and debug mechanism.
>>
>>And could you give me some advice abount the last my email.
>>
>>thanks
>>- Miyotaka Sakai.
>>
>>
>>Miyotaka Sakai wrote:
>>
>>>Steve,
>>>
>>>my responses inline .
>>>
>>>
>>>Steven Dake wrote:
>>>
>>>
>>>>Miyotaka-san
>>>>comments inline
>>>>
>>>>On Sat, 2004-10-02 at 13:27, Miyotaka Sakai wrote:
>>>>
>>>>
>>>>>steve,
>>>>>
>>>>>I've found a bug for AMF and made a patch .
>>>>>I would like you to review this patch .
>>>>>
>>>>>Several gmi_mcast messages couldn't be delivered .
>>>>>I changed gmi_mcast's message priorty from GMI_PRIO_HIGH to
>>>>>GMI_PRIO_RECOVERY ,then messages could be deliverd .
>>>>>
>>>>>If you have another idea, Please tel me.
>>>>>
>>>>>This patch add several mechanisms to ease to debug .
>>>>>( a mechanism is watching AmfComponent contents )
>>>>>
>>>>
>>>>The changing of priorities needs a different solution... I think I
>>>>messed this up for you when I added the recovery plug. Could you fix it
>>>>up?
>>>
>>>I am sure. I will.
>>>
>>>
>>>>The way it should work is this:
>>>>messages sent during recovery (within the amf_confchg_fn function and
>>>>all the other messages related to them) should be sent at recovery
>>>>priority. messages sent during normal operation should retain their
>>>>current priorities. Since the state machine must now (with the recovery
>>>>plug) send messages at two different priorities depending on what state
>>>>the system is in, we need to keep track of the priorities. It is
>>>>possible for the state machine to know the priority at which it should
>>>>send messages by placing the priority in the first message that starts
>>>>the state transitions. Then every message that triggers a new message
>>>>to be sent should encode the priority of the message to the next
>>>>state. If you need more details please ask.
>>>
>>>OK.
>>>I know what you mean .
>>>For exsample ,haStateSetCluster relating amf_confchg_fn uses
>>>GMI_PRIO_RECOVERY ,but others use GMI_PRIO_HIGH.
>>>It is possible ,and I 'll fix it .
>>>
>>>
>>>>It is possible that the output queues used in gmi_mcast are full. In
>>>>this case, it is necessary to use gmi_token_callback_create. Mark
>>>>Haverkamp has implemented this in the event service if you need an
>>>>example of using it. This callback will be called the next time it may
>>>>be possible to send messages. The callback should then resend the state
>>>>machine message that failed to send on an earlier try.
>>>
>>>
>>>I refered to Mark's implementation and ,I understaned this
>>>implementation. Please take some time to implement this .
>>>
>>>
>>>>Why the signal stuff in main.c?
>>>
>>>
>>>I want to konw AmfComponents cotents at the time when I want to know .
>>>( I don't know when I want to know AmfComponents contens.
>>> It is different in each debug situation. )
>>>In that case ,Most easy inmplementation is signal .
>>>If you have ,Could you tell me another idea ?
>>>
>>>
>>>>The added debug output looks fine. Please try to use log_printf instead
>>>>of printf though. Could you check in just the debug output as a
>>>>seperate commit after addressing the comment above?
>>>
>>>
>>>I'll change printf into log_print.
>>>
>>>Thanks
>>>- Miyotaka Sakai.
>>>
>>>
>>>>Thanks
>>>>-steve
>>>>
>>>>
>>>>
>>>>>If you accept this patch ,I'll check-in .
>>>>>Thanks !
>>>>>- Miyotaka Sakai.
>>>
>>______________________________________________________________________
>>--- openais.2004-10-01.21.30/exec/amf.c 2004-10-02 13:30:03.000000000 +0900
>>+++ openais/exec/amf.c 2004-10-05 01:56:48.000000000 +0900
>>@@ -478,8 +478,9 @@
>> return (0);
>> }
>>
>>-static void component_unregister (
>>- struct saAmfComponent *component)
>>+static void component_unregisterdtl (
>>+ struct saAmfComponent *component,
>>+
>
>
> why unregisterdtl? I dont understand what "dtl" means...
I'm sorry .
dtl means "detail parameter set".
I'll change this into component_unregisterpriority.
It means "unregister with priority".
> The patch is close but is missing something. Priority is encoded into
> the state machine transitions as well. The reason is because the call
> dsmDisabledUnlockedFailed() must call with the correct priority. This
> requires encoding the priority all along the call chain in component
> register and unregister error report, etc.
>
> Doing this has the side effect of requiring only one haStateSetCluster
> and readinessStateSetCluster call instead of *tdl.
>
> more comments below
>
>
>> int priority)
>> {
>> struct req_exec_amf_componentunregister req_exec_amf_componentunregister;
>> struct iovec iovecs[2];
>>@@ -509,12 +510,19 @@
>> iovecs[0].iov_base = (char *)&req_exec_amf_componentunregister;
>> iovecs[0].iov_len = sizeof (req_exec_amf_componentunregister);
>>
>>- assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_MED) == 0);
>>+ assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>> }
>>
>>-static void component_register (
>>+static void component_unregister (
>> struct saAmfComponent *component)
>> {
>>+ component_unregisterdtl (component,GMI_PRIO_MED);
>>+}
>>+
>>+static void component_registerdtl (
>>+ struct saAmfComponent *component,
>>+ int priority)
>>+{
>> struct req_exec_amf_componentregister req_exec_amf_componentregister;
>> struct iovec iovecs[2];
>>
>>@@ -526,7 +534,6 @@
>> }
>> log_printf (LOG_LEVEL_DEBUG, "component_register: registering component %s\n",
>> getSaNameT (&component->name));
>>- component->probableCause = SA_AMF_NOT_RESPONDING;
>
>
> ^^ this change is very suspect.. I spent alot of time getting this to
> work right when the application doesn't respond to a healthcheck (try
> ctrl-z on the active or standby application to simulate this
> condition). I guess the reason you had to remove this was because the
> if conditional in dsmDisabledUnlockedFailed (described above) failed to
> transition correctly if it was set.
Please take some time to consider it .
After that I will change this code.
But this program doesn't works with SA_AMF_NOT_RESPONDING.
This time ,please accept this way .
>> req_exec_amf_componentregister.header.size = sizeof (struct req_exec_amf_componentregister);
>> req_exec_amf_componentregister.header.id = MESSAGE_REQ_EXEC_AMF_COMPONENTREGISTER;
>>@@ -543,7 +550,7 @@
>> iovecs[0].iov_base = (char *)&req_exec_amf_componentregister;
>> iovecs[0].iov_len = sizeof (req_exec_amf_componentregister);
>>
>>- assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_RECOVERY) == 0);
>>+ assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>> }
>>
>> /***
>>@@ -747,9 +754,10 @@
>> }
>> #endif
>>
>>-void haStateSetCluster (
>>+static void haStateSetClusterdtl (
>> struct saAmfComponent *component,
>>- SaAmfHAStateT haState)
>>+ SaAmfHAStateT haState,
>>+ int priority)
>> {
>>
>> struct req_exec_amf_hastateset req_exec_amf_hastateset;
>>@@ -766,7 +774,14 @@
>> iovecs[0].iov_base = (char *)&req_exec_amf_hastateset;
>> iovecs[0].iov_len = sizeof (req_exec_amf_hastateset);
>>
>>- assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
>>+ assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>>+}
>>+
>>+static void haStateSetCluster (
>>+ struct saAmfComponent *component,
>>+ SaAmfHAStateT haState)
>>+{
>>+ haStateSetClusterdtl (component, haState, GMI_PRIO_HIGH);
>> }
>>
>> void readinessStateSetApi (struct saAmfComponent *component,
>>@@ -837,9 +852,10 @@
>> }
>> #endif
>>
>>-void readinessStateSetCluster (
>>+static void readinessStateSetClusterdtl(
>> struct saAmfComponent *component,
>>- SaAmfReadinessStateT readinessState)
>>+ SaAmfReadinessStateT readinessState,
>>+ int priority)
>> {
>>
>> struct req_exec_amf_readinessstateset req_exec_amf_readinessstateset;
>>@@ -857,7 +873,14 @@
>> iovecs[0].iov_base = (char *)&req_exec_amf_readinessstateset;
>> iovecs[0].iov_len = sizeof (req_exec_amf_readinessstateset);
>>
>>- assert (gmi_mcast (&aisexec_groupname, iovecs, 1, GMI_PRIO_HIGH) == 0);
>>+ assert (gmi_mcast (&aisexec_groupname, iovecs, 1, priority) == 0);
>>+}
>>+
>>+static void readinessStateSetCluster(
>>+ struct saAmfComponent *component,
>>+ SaAmfReadinessStateT readinessState)
>>+{
>>+ readinessStateSetClusterdtl (component, readinessState, GMI_PRIO_HIGH);
>> }
>>
>> #ifdef CMOPILE_OUT
>>@@ -1266,7 +1289,7 @@
>> /* noop - operational state */
>> break;
>> case AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED:
>>- dsmEnabledUnlockedActiveRequested (component);
>>+ dsmEnabledUnlockedStandbyRequested (component);
>> break;
>> case AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED:
>> /* noop - operational state */
>>@@ -1699,9 +1722,9 @@
>> if (component->source_addr.s_addr != source_addr->s_addr) {
>> return;
>> }
>>-
>>+
>> component->local = 1;
>>- component_unregister (component);
>>+ component_unregisterdtl (component, GMI_PRIO_RECOVERY);
>>
>> return;
>> }
>>@@ -1713,7 +1736,7 @@
>> return;
>> }
>>
>>- component_register (component);
>>+ component_registerdtl (component, GMI_PRIO_RECOVERY);
>> return;
>> }
>>
>>@@ -1724,8 +1747,8 @@
>> }
>>
>> /* dsm change must be needed */
>>- readinessStateSetCluster (component, component->currentReadinessState);
>>- haStateSetCluster (component, component->currentHAState);
>>+ readinessStateSetClusterdtl (component, component->currentReadinessState, GMI_PRIO_RECOVERY);
>>+ haStateSetClusterdtl (component, component->currentHAState ,GMI_PRIO_RECOVERY);
>>
>> return;
>> }
>>@@ -2324,7 +2347,6 @@
>> return (0);
>> }
>>
>>-
>> static int message_handler_req_amf_errorreport (struct conn_info *conn_info, void *message)
>> {
>> struct req_lib_amf_errorreport *req_lib_amf_errorreport = (struct req_lib_amf_errorreport *)message;
>>@@ -2474,3 +2496,115 @@
>> return (0);
>> }
>>
>
>
> Could you make these switches instead of if's? Or they could be a table
> such as
I see.
>
> /*
> * Make sure to sync to state machine variables (!)
> */
> static char **disabled_unlocked_state_text = {
> "AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL(%d)",
> "AMF_DISABLED_UNLOCKED_FAILED(%d)",
>
> ...
> };
>
> static char *amf_disabledunlockedstate_ntoa (state) {
> static char str_disabledstate[256];
>
> sprintf (str_disabledstate, disabled_unlocked_state_text[state],
> state);
> return (str_disabledstate);
> }
>
>
>>+static char *amf_disabledunlockedstate_ntoa (state)
>>+{
>>+ static char str_disabledstate[16];
>
>
> ^^^ this variable is not big enough to store the data
I'm sorry.
I'll collect appropriate size.
>>+ if (state == AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL) {
>>+ sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_REGISTEREDORERRORCANCEL(%d)", state);
>>+ } else if (state == AMF_DISABLED_UNLOCKED_FAILED) {
>>+ sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_FAILED(%d)", state);
>>+ } else if (state == AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED) {
>>+ sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_QUIESCED_REQUESTED(%d)", state);
>>+ } else if (state == AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED) {
>>+ sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_QUIESCED_COMPLETED(%d)", state);
>>+ } else if (state == AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED) {
>>+ sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_REQUESTED(%d)", state);
>>+ } else if (state == AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_COMPLETED) {
>>+ sprintf (str_disabledstate, "AMF_DISABLED_UNLOCKED_OUT_OF_SERVICE_COMPLETED(%d)", state);
>>+ } else {
>>+ sprintf (str_disabledstate, "Unkown(%d)", state);
>>+ }
>>+
>>+ return (str_disabledstate);
>>+}
>>+
>>+static char *amf_enabledunlockedstate_ntoa (state)
>>+{
>>+ static char str_enabledunlockedstate[16];
>>+ if (state == AMF_ENABLED_UNLOCKED_INITIAL) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_INITIAL(%d)", state);
>>+ } else if (state == AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_IN_SERVICE_REQUESTED(%d)", state);
>>+ } else if (state == AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_IN_SERVICE_COMPLETED(%d)", state);
>>+ } else if (state == AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_ACTIVE_REQUESTED(%d)", state);
>>+ } else if (state == AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_ACTIVE_COMPLETED(%d)", state);
>>+ } else if (state == AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_STANDBY_REQUESTED(%d)", state);
>>+ } else if (state == AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED) {
>>+ sprintf (str_enabledunlockedstate, "AMF_ENABLED_UNLOCKED_STANDBY_COMPLETED(%d)", state);
>>+ } else {
>>+ sprintf (str_enabledunlockedstate, "Unkown(%d)", state);
>>+ }
>>+
>>+ return (str_enabledunlockedstate);
>>+}
>>+
>>+static char *amf_readinessstate_ntoa (readinessstate)
>>+{
>>+ static char str_readiness[16];
>>+ if (readinessstate == SA_AMF_OUT_OF_SERVICE) {
>>+ sprintf (str_readiness, "SA_AMF_OUT_OF_SERVICE(%d)", readinessstate);
>>+ } else if (readinessstate == SA_AMF_IN_SERVICE) {
>>+ sprintf (str_readiness, "SA_AMF_IN_SERVICE(%d)", readinessstate);
>>+ } else if (readinessstate == SA_AMF_STOPPING) {
>>+ sprintf (str_readiness, "SA_AMF_STOPPING(%d)", readinessstate);
>>+ } else {
>>+ sprintf (str_readiness, "Unkown (%d)", readinessstate);
>>+ }
>>+
>>+ return (str_readiness);
>>+}
>>+
>>+static char *amf_hastate_ntoa (hastate)
>>+{
>>+ static char str_hastate[16];
>>+ if (hastate == SA_AMF_ACTIVE) {
>>+ sprintf (str_hastate, "SA_AMF_ACTIVE(%d)", hastate);
>>+ } else if (hastate == SA_AMF_STANDBY) {
>>+ sprintf (str_hastate, "SA_AMF_STANDBY(%d)", hastate);
>>+ } else if (hastate == SA_AMF_QUIESCED) {
>>+ sprintf (str_hastate, "SA_AMF_QUIESCED(%d)", hastate);
>>+ } else {
>>+ sprintf (str_hastate, "Unkown (%d)", hastate);
>>+ }
>>+
>>+ return (str_hastate);
>>+}
>>+
>
> ^^ same comments apply from above
>
>
>
>>+static void amf_dump_comp (struct saAmfComponent *component ,void *data)
>>+{
>>+ char name[64];
>>+ data = NULL;
>>+
>>+ log_printf (LOG_LEVEL_DEBUG, "----------------\n" );
>>+ log_printf (LOG_LEVEL_DEBUG, "registered = %d\n" ,component->registered);
>>+ log_printf (LOG_LEVEL_DEBUG, "local = %d\n" ,component->local );
>>+ log_printf (LOG_LEVEL_DEBUG, "source_addr = %s\n" ,inet_ntoa (component->source_addr));
>>+ memset (name, 0 , sizeof(name));
>>+ memcpy (name, component->name.value, component->name.length);
>>+ log_printf (LOG_LEVEL_DEBUG, "name = %s\n" ,name );
>>+ log_printf (LOG_LEVEL_DEBUG, "currentReadinessState = %s\n" ,amf_readinessstate_ntoa (component->currentReadinessState));
>>+ log_printf (LOG_LEVEL_DEBUG, "newReadinessState = %s\n" ,amf_readinessstate_ntoa (component->newReadinessState));
>>+ log_printf (LOG_LEVEL_DEBUG, "currentHAState = %s\n" ,amf_hastate_ntoa (component->currentHAState));
>>+ log_printf (LOG_LEVEL_DEBUG, "newHAState = %s\n" ,amf_hastate_ntoa (component->newHAState));
>>+ log_printf (LOG_LEVEL_DEBUG, "enabledUnlockedState = %s\n" ,amf_enabledunlockedstate_ntoa (component->enabledUnlockedState));
>>+ log_printf (LOG_LEVEL_DEBUG,"disabledUnlockedState = %s\n" ,amf_disabledunlockedstate_ntoa (component->disabledUnlockedState));
>>+ log_printf (LOG_LEVEL_DEBUG,"probableCause = %d\n" ,component->probableCause );
>>+}
>>+
>
>
> ^^^ this debug output is really good work thanks for the effort here.
>
>
>
>>+void amf_dump ( )
>>+{
>>+ pid_t pid;
>>+
>>+ pid = fork ();
>
>
> why the fork? I'd rather not have this unless it is absolutely
> necessary.. It could create some unforseen problems or change behavior
> during debugging.
I will delete fork .
>
>
>>+ if (pid == 0) {
>>+ enumerate_components (amf_dump_comp, NULL);
>>+ fflush (stdout);
>>+ exit (0);
>>+ }
>>+
>>+ return;
>>+}
>>--- openais.2004-10-01.21.30/exec/amf.h 2004-10-02 13:30:03.000000000 +0900
>>+++ openais/exec/amf.h 2004-10-03 04:00:13.000000000 +0900
>>@@ -52,6 +52,7 @@
>> int trackActive;
>> };
>>
>>+void amf_dump ();
>> extern struct service_handler amf_service_handler;
>>
>> #endif /* AMF_H_DEFINED */
>>--- openais.2004-10-01.21.30/exec/main.c 2004-10-02 13:30:03.000000000 +0900
>>+++ openais/exec/main.c 2004-10-05 01:47:36.000000000 +0900
>>@@ -143,6 +143,21 @@
>> return (conn_info);
>> }
>>
>>+#ifdef DEBUG
>>+static void aisexec_sigusr2 ( )
>>+{
>>+ amf_dump ();
>>+
>>+ signal (SIGUSR2 ,aisexec_sigusr2);
>>+ return;
>>+}
>>+
>>+static void aisexec_signal ( )
>>+{
>>+ signal (SIGUSR2 ,aisexec_sigusr2);
>>+ return;
>>+}
>>+#endif
>>
>
>
> this signal is confusing to me.. You call aisexec_signal which sets
> signal to aisexec_sigusr2? What is the purpose of this code?
I take abount this with you previous email .
I should implement aisexec_debugdump_fn and give me some time.
> the prototype is not right for a signal handler it should be (example
> from main.c) :
>
> void sigintr_handler (int signum)
You are right.
I'll correct .
>
>
>
>> struct sockaddr_in this_ip;
>> #define LOCALHOST_IP inet_addr("127.0.0.1")
>>@@ -873,6 +888,10 @@
>>
>> aisexec_poll_handle = poll_create ();
>>
>>+#ifdef DEBUG
>>+ aisexec_signal ( );
>>+#endif
>>+
>
> just call:
> signal (SIGUSR2, sigusr2_handler);
OK .
>
>
>
>> /*
>> * if gmi_init doesn't have root priveleges, it cannot
>> * bind to a specific interface. This only matters if
>
>
More information about the Openais
mailing list