[Openais] whitetank cluster not reforming after 'if down'

Andrew Beekhof andrew at beekhof.net
Thu Jul 30 05:54:06 PDT 2009


Steve, I've been able to reproduce this reliably _without_ Pacemaker  
being involved.

Attached are the two openais log files.

Scenario:

t0: hikari and hikari2 are up and can see each other
t1: Powercycle hikari2
t2: hikari2 comes up
t3: ping confirms that hikari2 can contact hikari
     I modified the openais init script to run: ping -c 10 hikari >  
afile
t4: hikari2 starts openais
t5: hikari starts producing membership events every 3s but does not  
form a membership with hikari2
t6: hikari2 forms a membership by itself
t7: (about 1 or 2 minutes after t6, it varies) hikari and hikari2 form  
a combined membership

The strangest part of this, is that hikari2 must reboot in order to  
trigger this behavior.
Stopping or killing aisexec and then starting it again is not  
sufficient.

Do you want to continue the discussion here or move to bugzilla?

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hikari2.txt
Url: http://lists.linux-foundation.org/pipermail/openais/attachments/20090730/e74ed3d9/attachment-0002.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hikari.txt
Url: http://lists.linux-foundation.org/pipermail/openais/attachments/20090730/e74ed3d9/attachment-0003.txt 
-------------- next part --------------


On Jul 21, 2009, at 1:49 PM, Lars Marowsky-Bree wrote:

> On 2009-06-30T12:27:33, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> I'm working with a cluster that's having trouble reforming.
>> Before I explain, here is the totem section (which is the same on  
>> both
>> nodes, except for the nodeid).
>
> Hi all, Steven,
>
> this problem persist. After a reboot, we sometimes see memberships not
> reforming - for example, A B C D E, C & D reboot, we end up with A-B-E
> and C-D or C / D by themselves or some other really weird membership.
>
> The problem persist with latest whitetank. Occassionally it seems that
> one of the dlm_controld processes seems to be hogging IPC (which seems
> to be quite affecting the rest of the system), but this isn't always  
> the
> case.
>
> It is not always reproducible, and the symptoms are, well, weird.
>
> Has anyone else ever seen this?
>
>
>
> Regards,
>    Lars
>
> -- 
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar  
> Wilde
>

-- Andrew





More information about the Openais mailing list