[Openais] Checkpoint crash in aisexec
Steven Dake
sdake at mvista.com
Tue Feb 15 14:56:03 PST 2005
Muni
The ring id will be delivered as part of the confchg_fn callback from
totmsrp.c. From the checkpoint perspective, this configuration change
will be delivered as a parameter to ckpt.c:ckpt_confchg_fn().
So we will change ckpt_confchg_fn to:
static int ckpt_confchg_fn (
struct memb_ring_id *ring_id,
enum totempg_configuration_type configuration_type,
struct in_addr *member_list, void *member_list_private,
int member_list_entries,
struct in_addr *left_list, void *left_list_private,
int left_list_entries,
struct in_addr *joined_list, void *joined_list_private,
int joined_list_entries)
To do that the following changes have to be made:
1. handlers.h has to be modified to defne this new paramemter
2. every service must be modified with handlers.h new definition
3. all callbacks that deliver the confchg_fn must be modified to deliver
the ring id in addition. The functions are main.c:confchg_fn,
totempg.c:totempg_confchg_fn, totemsrp.c:totemsrp_confchg_fn and
totemsrp_initialize.
4. struct memb_ring_id must be moved from totemsrp.c to totemsrp.h so
others know how to read its contents.
This is a good first patch to get familiar with the flow of the
configuration change delivery.
Regards
-steve
> For 5.) I see that you store it on disk.
> sprintf (filename, "/tmp/ringid_%s",inet_ntoa (my_id.sin_addr));
>
> Should I access that from ckpt.c via (memb_ring_id_create_or_load) ? I
> see that it is not defined in the header file for totemsrp.h . Is that
> not meant to be accessed ? If not what would you suggest ?
>
> Thanks
>
> Muni
>
> -----Original Message-----
> From: Bajpai, Muni [NGC:B670:EXCH]
> Sent: Tuesday, February 15, 2005 3:54 PM
> To: 'sdake at mvista.com'; Smith, Kristen [NGC:B675:EXCH]
> Cc: markh at osdl.org
> Subject: RE: [Openais] Checkpoint crash in aisexec
>
>
> Hey Steve,
>
> I work with kristen and need some more info on the checkpoint recovery
> ...
>
> 1.) So the logic for accepting a configuration change from a processor
> is :
> if ((incoming_ring_id == last_known_ring_id)
> && (source_processor != delivering_processor) {
>
> //IGNORE Change.
> }
>
> So as per my understanding:
> 1.) (Ckpt Executive Perspective) If the change is from ME then
> always change
> 2.) if the ring_id's don't match then always change.
>
> Please confirm.
>
> 2.) We must add support for the new data structure additions in the
> Ckpt Executive Opens and Close handlers also.
>
> 3.) The addition as you enumerated to the checkpoint data structure,
> did you have any implementation preferences or did you want us to use
> anything appropriates (cursively I was thinking of a list of struct
> refs)
>
> 4.) The last_known_ring_id. What does that mean to a newly added
> processor. Explicitly ( incoming_ring_id == last_known_ring_id ) will
> always fail on a newly commissioned processor. Am I understanding that
> correctly ? Where is the last_known_ring_id stored ?
>
> 5.) Is exec/evt.c the best example for any ideas on implementation ??
>
>
> Thanks
>
> Muni
>
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com]
> Sent: Tuesday, February 15, 2005 1:51 PM
> To: Smith, Kristen [NGC:B675:EXCH]
> Cc: markh at osdl.org; openais at lists.osdl.org; Bajpai, Muni
> [NGC:B670:EXCH]
> Subject: RE: [Openais] Checkpoint crash in aisexec
>
>
> On Tue, 2005-02-15 at 09:47, Kristen Smith wrote:
> > Steve,
> >
> > Thanks for the response - I hear ya loud and clear - not good
> without
> > recovery. So, is there something that we could do to help you with
> > this recovery coding? If you had some type of design thoughts on how
> > you wanted checkpoint recovery to occur, maybe that is something we
> > could help out with. Just throwing this out there to see what you
> > think.
> >
>
> Kristen
> You have done alot to help us so far but more help is always
> appreciated
> :)
>
> If someone from your org wanted to get started writing code for
> checkpoint recovery that would be great! I spent some time in the
> drive to work this morning thinking about how checkpoint recovery
> should work:
>
> There are 3 main steps that should be done in order:
> 1. synchronize checkpoint reference counts (so retention timers work
> properly)
> 2. synchronize checkpoint metadata contents (sizes, sections, etc) 2.
> synchronize checkpoint section data contents
>
> The place to get started is on the reference count synchronization.
>
> The checkpoint must contain a list of active user's processor ids
> along with their reference count. So if processor A has checkpoint 1
> open twice, and processor B has checkpoint 1 open three times, and
> processor C has checkpoint 1 open four times each processor would
> maintain a list for the checkpoint (in the checkpoint data structure):
>
> p_A:r_2
> p_B:r_3
> p_C:r_4
>
> Then on a configuration change, the leaving processors would close
> their reference counts. So in this example, p_B leaves then the
> processor ref count looks like: p_A:r_2 p_C:r_4
>
> During this configuration change, a processor joins p_D. It has
> checkpoint 1 open 1 time. p_D gets a configuration change {add p_A,
> p_C} and then sends a synchronization message with its previous ring
> identifier and current list of checkpoint reference counts (after the
> above leave in the configuration change was processed). The
> representative of {p_A, p_C} also sends a synchronization message with
> the previous ring identifier and a current list of checkpoint
> reference counts. If the previous ring identifiers match and the
> sending processor is not the delivering processor then p_C should
> ignore p_A's message (ie: p_C receives p_A message, but it already
> knows about p_A's references).
>
> This requires us to add the ring identifier to the configuration
> change.
>
> So now each previous configuration is aware of the new configuration.
> The reference counts look like:
> p_A:r_2
> p_C:r_4
> p_D:r_1
>
> The above maintenence of the reference counts, or open checkpoints,
> must maintain a per-checkpoint variable which is the "reference count
> for this checkpoint". In the last case, that reference count would be
> 7.
>
> Each time a processor leaves, its reference counts are subtracted from
> this "global ref count". Each time a processor is added, its
> reference counts are added. This reference count is then what is used
> for retention duration.
>
> Any thoughts on the above approach welcome.
>
> Thanks!
> -steve
>
> > Thanks,
> > Kristen
> >
> > -----Original Message-----
> > From: Steven Dake [mailto:sdake at mvista.com]
> > Sent: Monday, February 14, 2005 2:17 PM
> > To: Smith, Kristen [NGC:B675:EXCH]; markh at osdl.org;
> > openais at lists.osdl.org
> > Cc: Bajpai, Muni [NGC:B670:EXCH]
> > Subject: RE: [Openais] Checkpoint crash in aisexec
> >
> >
> > On Sat, 2005-02-12 at 08:08, Kristen Smith wrote:
> > > Steve,
> > >
> > > Thanks for the response.
> > >
> > > For recovery - what are the ramifications if we don't have
> recovery
> > > working 100%? What I see now is that when a node leaves the
> cluster
> > > and then rejoins, it receives evt messages, but it can take
> anywhere
> > > from 15seconds to minutes for evt messages sent from that node to
> > > reach the other applications. I handle this with some
> >
> > Mark have you seen this issue?
> >
> > > message retries which is ok in this startup case. However, are we
> in
> > > jeopardy in other cases that I am not considering? When running
> > > traffic the past few days and seeing periodic reconfigs, I don't
> > seem
> > > to be losing messages when that occurs - I only see the lost
> > messages
> > > when I actually kill a node and start it back up to rejoin the
> > > cluster.
> > >
> >
> > What we have today is totally unacceptable because atleast for
> > checkpointing, there is no recovery. And Mark is waiting on my base
> > code for event recovery.
> >
> > Definition of 100% working means if there is a failure during
> > recovery, we are guaranteed a consistent state. I think evt is
> pretty
> > close to this goal, although the checkpoint replication after merge
> > has not been developed yet. I can think of alot of easy ways to do
> > this, but handling a failure during the recovery phase makes it more
> > difficult.
> >
> > Definition of almost 100% is that recovery works properly if there
> are
> > no faults during recovery (ie: the merge process), but if there is a
> > fault during recovery (ie: reconfig) something could go awry.
> >
> > We want consistently replicated data (the 100% case). 100% is
> > probably past your development window; the other case is within
> reach.
> >
> > Regards
> > -steve
> >
> > > Thanks
> > > Kristen
> > >
> > > -----Original Message-----
> > > From: Steven Dake [mailto:sdake at mvista.com]
> > > Sent: Friday, February 11, 2005 5:30 PM
> > > To: Smith, Kristen [NGC:B675:EXCH]
> > > Subject: RE: [Openais] Checkpoint crash in aisexec
> > >
> > >
> > > Ok well I doubt with 200 byte checkpoints there is a buffer
> > overflow.
> > > :)
> > >
> > > Recovery will come after 188 is wrapped up. I think your two
> weeks
> > > window looks good for alpha-level recovery (ie: works most of the
> > > time). High quality production recovery will not hit your window
> > for
> > > development (ie: works 100% of the time no matter what happens).
> > >
> > > Thanks
> > > -steve
> > >
> > > On Fri, 2005-02-11 at 15:56, Kristen Smith wrote:
> > > > Steve,
> > > >
> > > > The size of the checkpoints are ~200 bytes.
> > > >
> > > > I agree, valgrind is an excellent tool. We will run it through
> and
> > > see
> > > > if that shows anything.
> > > >
> > > > I have tried this scenario maybe 30 times today (for various
> other
> > > > testing) and it happened maybe 10 times. For a while I could
> > > reproduce
> > > > with a given test about 5 times and then it hasn't happened
> again.
> > > >
> > > > Sounds like defect-188 fixing is going well. May I ask how the
> > > > recovery work is going as well? (Don't mean to be pushy on that
> > > front
> > > > - we have 2 more weeks of coding for our application left and I
> am
> > > > really hoping that we are able to put the new recovery code in
> > > during
> > > > that time).
> > > >
> > > > Thanks a bunch,
> > > > Kristen
> > > >
> > > > -----Original Message-----
> > > > From: Steven Dake [mailto:sdake at mvista.com]
> > > > Sent: Friday, February 11, 2005 4:37 PM
> > > > To: Smith, Kristen [NGC:B675:EXCH]
> > > > Subject: Re: [Openais] Checkpoint crash in aisexec
> > > >
> > > >
> > > > how large are the read or write requests?
> > > > just a thought there could be some buffer overrun with larger
> > > > requests.
> > > >
> > > > On Fri, 2005-02-11 at 14:55, Kristen Smith wrote:
> > > > > Steve,
> > > > >
> > > > > We are periodically seeing aisexec crash with the following
> > trace:
> > > > >
> > > > > (gdb) bt
> > > > > #0 message_handler_req_lib_ckpt_checkpointclose
> > > > > (conn_info=0x0, message=0xb73fc008) at ckpt.c:1552
> > > > > #1 0x080494c2 in poll_handler_libais_deliver
> (handle=0,
> > > > fd=3,
> > > > > revent=134633824, data=0x89c2ad8,
> > > > > prio=0x89b2784) at main.c:578
> > > > > #2 0x08056e62 in poll_run (handle=0) at aispoll.c:386
> > > > >
> > > > >
> > > > > #3 0x080499ac in main (argc=1, argv=0xbfffcb64) at
> main.c:1003
> > > > >
> > > > > We have looked through the code but can't seem to figure out
> how
> > > > > conn_info is getting set to 0. Do you have any idea under what
> > > > > circumstances conn_info could be null when this function is
> > > called?
> > > > >
> > > > > This is happening when we have multiple nodes up and we kill
> one
> > > of
> > > > > the active nodes. The standby node (which was reading
> > checkpoints)
> > > > > must now become a writer, so it closes the checkpoint and this
> > > > > happens. Unfortunately, I can't reproduce this consistently -
> I
> > > > > finally got a core dump today. I don't recall ever seeing this
> > > with
> > > > > the old code.
> > > > >
> > > > > Thanks,
> > > > > Kristen
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > > > > _______________________________________________
> > > > > Openais mailing list
> > > > > Openais at lists.osdl.org
> > > > http://lists.osdl.org/mailman/listinfo/openais
> > > >
> > > >
> > >
> > >
> >
> >
>
>
More information about the Openais
mailing list