[Openais] whitetank/flatiron node id mismatch on big-endian platforms

Dejan Muhamedagic dejan at suse.de
Tue Mar 9 12:17:22 PST 2010


Hello,

An upgrade of one node from whitetank (openais) to flatiron
(corosync) made that node incapable of joining the cluster again.
During the upgrade the other nodes were running. The issue turned
out to be that between the two releases the node ids are
calculated differently on big endian platforms (this was on
s390x). Actually, openais/corosync didn't find this a problem,
but pacemaker couldn't match the old and the new node id.

Reverting the following patch fixed the issue:

Index: branches/flatiron/exec/totemip.c
===================================================================
--- branches/flatiron/exec/totemip.c    (revision 2428)
+++ branches/flatiron/exec/totemip.c    (revision 2429)
@@ -376,6 +376,9 @@
              */
             totemip_sockaddr_to_totemip_convert((struct sockaddr_storage *)sockaddr_in, boundto);
             boundto->nodeid = sockaddr_in->sin_addr.s_addr;
+#if __BYTE_ORDER == __BIG_ENDIAN
+            boundto->nodeid = swab32 (boundto->nodeid);
+#endif

             if (ioctl(id_fd, SIOCGLIFFLAGS, &lifreq[i]) < 0) {
                 printf ("couldn't do ioctl\n");
@@ -614,6 +617,9 @@
     if (ipaddr.family == AF_INET && ipaddr.nodeid == 0) {
                 unsigned int nodeid = 0;
                 memcpy (&nodeid, ipaddr.addr, sizeof (int));
+#if __BYTE_ORDER == __BIG_ENDIAN
+        nodeid = swab32 (nodeid);
+#endif
         if (mask_high_bit) {
                         nodeid &= 0x7FFFFFFF;
         }

The nodeids with flatiron do appear now the same on both big and
little endian platforms, but this regression prevents rolling
upgrades of single nodes. Also, the ids are in a reversed order,
for instance 192.168.100.13 gets the id 224700608 (hex 0D64A8C0).

There is some discussion at the Novell bugzilla:
https://bugzilla.novell.com/show_bug.cgi?id=584976

Thanks,

Dejan


More information about the Openais mailing list