[Openais] whitetank/flatiron node id mismatch on big-endian platforms
Dejan Muhamedagic
dejan at suse.de
Tue Mar 9 12:17:22 PST 2010
Hello,
An upgrade of one node from whitetank (openais) to flatiron
(corosync) made that node incapable of joining the cluster again.
During the upgrade the other nodes were running. The issue turned
out to be that between the two releases the node ids are
calculated differently on big endian platforms (this was on
s390x). Actually, openais/corosync didn't find this a problem,
but pacemaker couldn't match the old and the new node id.
Reverting the following patch fixed the issue:
Index: branches/flatiron/exec/totemip.c
===================================================================
--- branches/flatiron/exec/totemip.c (revision 2428)
+++ branches/flatiron/exec/totemip.c (revision 2429)
@@ -376,6 +376,9 @@
*/
totemip_sockaddr_to_totemip_convert((struct sockaddr_storage *)sockaddr_in, boundto);
boundto->nodeid = sockaddr_in->sin_addr.s_addr;
+#if __BYTE_ORDER == __BIG_ENDIAN
+ boundto->nodeid = swab32 (boundto->nodeid);
+#endif
if (ioctl(id_fd, SIOCGLIFFLAGS, &lifreq[i]) < 0) {
printf ("couldn't do ioctl\n");
@@ -614,6 +617,9 @@
if (ipaddr.family == AF_INET && ipaddr.nodeid == 0) {
unsigned int nodeid = 0;
memcpy (&nodeid, ipaddr.addr, sizeof (int));
+#if __BYTE_ORDER == __BIG_ENDIAN
+ nodeid = swab32 (nodeid);
+#endif
if (mask_high_bit) {
nodeid &= 0x7FFFFFFF;
}
The nodeids with flatiron do appear now the same on both big and
little endian platforms, but this regression prevents rolling
upgrades of single nodes. Also, the ids are in a reversed order,
for instance 192.168.100.13 gets the id 224700608 (hex 0D64A8C0).
There is some discussion at the Novell bugzilla:
https://bugzilla.novell.com/show_bug.cgi?id=584976
Thanks,
Dejan
More information about the Openais
mailing list