[Bugme-new] [Bug 12570] New: Bonding does not work over e1000e.

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Thu Jan 29 03:12:01 PST 2009


http://bugzilla.kernel.org/show_bug.cgi?id=12570

           Summary: Bonding does not work over e1000e.
           Product: Drivers
           Version: 2.5
     KernelVersion: 2.6.29-rc1
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Network
        AssignedTo: jgarzik at pobox.com
        ReportedBy: khorenko at parallels.com


Checked (failing) kernel: 2.6.29-rc1
Latest working kernel version: unknown
Earliest failing kernel version: not checked but probably any. RHEL5 kernels
are also affected.

Distribution: Enterprise Linux Enterprise Linux Server release 5.1 (Carthage)

Hardware Environment:
lspci:
15:00.0 Ethernet controller: Intel Corporation 82571EB Quad Port Gigabit
Mezzanine Adapter (rev 06)
15:00.1 Ethernet controller: Intel Corporation 82571EB Quad Port Gigabit
Mezzanine Adapter (rev 06)

15:00.0 0200: 8086:10da (rev 06)
        Subsystem: 103c:1717
        Flags: bus master, fast devsel, latency 0, IRQ 154
        Memory at fdde0000 (32-bit, non-prefetchable) [size=128K]
        Memory at fdd00000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 6000 [size=32]
        [virtual] Expansion ROM at d1300000 [disabled] [size=512K]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable+
        Capabilities: [e0] Express Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 24-d1-78-ff-ff-78-1b-00

15:00.1 0200: 8086:10da (rev 06)
        Subsystem: 103c:1717
        Flags: bus master, fast devsel, latency 0, IRQ 162
        Memory at fdce0000 (32-bit, non-prefetchable) [size=128K]
        Memory at fdc00000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 6020 [size=32]
        [virtual] Expansion ROM at d1380000 [disabled] [size=512K]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable+
        Capabilities: [e0] Express Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 24-d1-78-ff-ff-78-1b-00

Problem Description: Bonding does not work over NICs supported by e1000e: if
you brake/restore physical links of bonding slaves one by one - network won't
work anymore.

Steps to reproduce:
2 NICs supported by e1000e put into bond device (Bonding Mode: fault-tolerance
(active-backup)).
* ping to the outside node is ok
* physically brake the link of active bond slave (1)
* bond detects the failure, makes another slave (2) active.
* ping works fine
* restore the connection of (1)
* ping works fine
* brake the link of (2)
* bond detects it, reports that it makes active (1), but
* ping _does not_ work anymore

Logs:
/var/log/messages:
Jan 27 11:53:29 host kernel: 0000:15:00.0: eth2: Link is Down
Jan 27 11:53:29 host kernel: bonding: bond1: link status definitely down for
interface eth2, disabling it
Jan 27 11:53:29 host kernel: bonding: bond1: making interface eth3 the new
active one.
Jan 27 11:56:37 host kernel: 0000:15:00.0: eth2: Link is Up 1000 Mbps Full
Duplex, Flow Control: RX/TX
Jan 27 11:56:37 host kernel: bonding: bond1: link status definitely up for
interface eth2.
Jan 27 11:57:39 host kernel: 0000:15:00.1: eth3: Link is Down
Jan 27 11:57:39 host kernel: bonding: bond1: link status definitely down for
interface eth3, disabling it
Jan 27 11:57:39 host kernel: bonding: bond1: making interface eth2 the new
active one.

What was done + dumps of /proc/net/bonding/bond1:
## 11:52:42
##cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:00:1c

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:00:1e

## 11:53:05 shutdown eth2 uplink on the virtual connect bay5
##cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:00:1c

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:00:1e

## 11:56:01 turn on eth2 uplink on the virtual connect bay5
##cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:00:1c

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:00:1e

## 11:57:22 turn off eth3 uplink on the virtual connect bay5
##cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:00:1c

Slave Interface: eth3
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:00:1e


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-new mailing list