[Openais] CKPT: bug, global_ckpt_id not synced

Steven Dake sdake at redhat.com
Thu Sep 7 08:41:40 PDT 2006


Hans,

I am also working on this particular problem, although "max" is not
sufficient sine it is possible for checkpoint ids to wrap.  Ideally the
checkpoint ids would wrap with a proper action.

I don't know if it is realistic for a checkpoint id to wrap, but I'd
like it to work if this were to happen after 1-2 years of runtime in
heavy checkpoint create/unlink environments.

Regards
-steve

On Thu, 2006-09-07 at 09:01 +0200, Hans Feldt wrote:
> Steven/Muni, could you please comment on this issue?
> 
> I believe it could be the root of much evil. I would like to get this 
> committed asap since it is a stopping issue for our testing.
> 
> Regards,
> Hans
> 
> Hans Feldt wrote:
> > Test case:
> > - start first node
> > - create (with data) checkpoint 1 on first node
> > - create (with data) checkpoint 2 on first node
> > - start 2nd node
> > - create (with data) checkpoint 3 on 2nd node
> > - read checkpoint 3 on first node (fails without patch)
> > 
> > There seems to be more errors related to the ckpt_id which was 
> > introduced in r1139. Stay tuned or help us out.
> > 
> > Regards,
> > Hans
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > Index: ckpt.c
> > ===================================================================
> > --- ckpt.c	(revision 1238)
> > +++ ckpt.c	(working copy)
> > @@ -345,6 +345,7 @@
> >  
> >  DECLARE_LIST_INIT(checkpoint_recovery_list_head);
> >  
> > +/* cluster wide synchronized checkpoint ID */
> >  static mar_uint32_t global_ckpt_id = 0;
> >  
> >  struct checkpoint_cleanup {
> > @@ -2105,6 +2106,11 @@
> >  		log_printf (LOG_LEVEL_DEBUG, "recovery CHECKPOINT reopened is %p\n", checkpoint);
> >  	}
> >  
> > +	/* synchronize global_ckpt_id to max(ckpt_id,global_ckpt_id)+1 */
> > +	if (ckpt_id > global_ckpt_id) {
> > +		global_ckpt_id = ckpt_id + 1;
> > +	}
> > +
> >  	/*CHECK to see if there are any existing ckpts*/
> >  	if ((checkpoint->ckpt_refcnt) &&  (ckpt_refcnt_total(checkpoint->ckpt_refcnt) > 0)) {
> >  		log_printf (LOG_LEVEL_DEBUG,"calling merge_ckpt_refcnts\n");
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.osdl.org
> > https://lists.osdl.org/mailman/listinfo/openais
> 
> _______________________________________________
> Openais mailing list
> Openais at lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/openais




More information about the Openais mailing list