[Openais] checkpoint disappears after node reset

Steven Dake sdake at redhat.com
Tue Sep 25 15:29:39 PDT 2007


Henry

I'll give it a rerun tonight.

Can you tell me your checkpoint creation parameters?  Are you using
sections?  Can you give me your timeouts on the expiration of the
checkpoints?

Regards
-steve
On Tue, 2007-09-25 at 15:22 -0700, Henry Fung wrote:
> Steve,
> I used 0.80.2 and moving to 0.80.3 doe not help.
> There is no core dump. The problem always happens when
> the first member of the ring drops dead (thru a system
> init 6, e.g.). I tried various hacks with no prevail
> including:
> 1. open the checkpoint on the standby node early
> 2. use SA_TIME_END
> Reading of the checkpoint on the standby may succeed
> for a short while until the writing node is completely
> gone. Then, there is either the error code 2 (or 6) on
> further reads. Therefore, I am sure things are working
> properly prior to the node reset.
> 
> My guess is something to do with the selecting the
> ring representative preferring the lowest node id and
> the standby node does not become the rep soon enough
> after the rep drops dead, or the new rep somehow
> deletes the checkpoint during some sync stage.
> 



More information about the Openais mailing list