[Openais] [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

Thu Jun 4 09:32:21 PDT 2009

On Thu, 2009-06-04 at 18:30 +0200, Lars Marowsky-Bree wrote:
> On 2009-06-04T09:23:04, Steven Dake <sdake at redhat.com> wrote:
> 
> > The problem with checking the link status with the current code is that
> > the protocol blocks I/O waiting for a response from the failed ring.
> > This could of course be modified to behave differently.
> 
> Right, so the rechecking could possibly be a separate thread, sending an
> occasional liveness packet on the failed ring and trigger the RRP
> recovery after it has heard from other nodes on it?

Well I prefer totem to remain nonthreaded except for encrypted xmit
operations, but in general, that is the basic idea.  

> Some smarts would be needed of course to not constantly retrigger
> partially active rings (which would fail again immediately).
> 
> > So the act of failing a link is expensive and we dont want to retest
> > that it is valid very often.
> 
> Does "expensive" mean that it'll actually slow down the healthy
> ring(s)?
> 
At the moment it blocks until the problem counter reaches the threshold
at which point the ring is declared failed and normal communication
continues.
> 
> Regards,
>     Lars
>