[Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

Thu Jul 2 04:13:27 PDT 2009

Steve,
I've tested:
2x trunk + patch
1x whitetank (RHEL 5.3)

it looks like communication is without bigger problems. Of course,
whitetank is not able to use new type of messages, so bug is still
there, but new nodes works correctly.

Steven Dake wrote:
> Does this cross communicate with whitetank properly?
> 
> Regards
> -steve
> 
> On Wed, 2009-07-01 at 18:21 +0200, Jan Friesse wrote:
>> Included patch should fix
>> https://bugzilla.redhat.com/show_bug.cgi?id=506255 .
>>
>> David, I hope it will fix problem for you.
>>
>> It's based on simple idea of adding node startup timestamp at the end of
>> cpg_join (and joinlist) calls. If timestamp is larger then old timestamp
>> we know, node was restarted and we didn't notices -> deliver leave event
>> and then join event. If timestamp is same (or in special cases lower) ->
>> new cpg app joined -> send only join event.
>>
>> Of course, patch isn't so simple. Cpg_join messages are always send as
>> larger messages with timestamp (btw. timestamp is 64-bit value, because
>> I expect l(o^64)ng life of corosync ;) ). On delivery, we test, if
>> message is larger then standard message. If it is -> we have ts -> use it.
>>
>> Bigger problem was joinlist, because it's array, ... you will see in
>> source. Solution is, to send special entry, with pid 0 (shouldn't ever
>> happened to process, to have pid 0), and timestamp encoded in name
>> (ugly, but looks like working).
>>
>> Please comment, if you can.
>>
>> Regards,
>>   Honza
>> _______________________________________________
>> Openais mailing list
>> Openais at lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/openais
>