[Openais] checkpoint disappears after node reset
Steven Dake
sdake at redhat.com
Tue Sep 25 15:29:39 PDT 2007
Henry
I'll give it a rerun tonight.
Can you tell me your checkpoint creation parameters? Are you using
sections? Can you give me your timeouts on the expiration of the
checkpoints?
Regards
-steve
On Tue, 2007-09-25 at 15:22 -0700, Henry Fung wrote:
> Steve,
> I used 0.80.2 and moving to 0.80.3 doe not help.
> There is no core dump. The problem always happens when
> the first member of the ring drops dead (thru a system
> init 6, e.g.). I tried various hacks with no prevail
> including:
> 1. open the checkpoint on the standby node early
> 2. use SA_TIME_END
> Reading of the checkpoint on the standby may succeed
> for a short while until the writing node is completely
> gone. Then, there is either the error code 2 (or 6) on
> further reads. Therefore, I am sure things are working
> properly prior to the node reset.
>
> My guess is something to do with the selecting the
> ring representative preferring the lowest node id and
> the standby node does not become the rep soon enough
> after the rep drops dead, or the new rep somehow
> deletes the checkpoint during some sync stage.
>
More information about the Openais
mailing list