[Bridge] Incoming packets not always traversing the bridge
richardvoigt at gmail.com
richardvoigt at gmail.com
Wed Jan 20 14:03:29 PST 2010
On Wed, Jan 20, 2010 at 3:45 PM, Robert LeBlanc <robert at leblancnet.us> wrote:
> Another quick fix is to remove all physical NICs except for one from the
> Virtual switch(s) that are being bridged. You lose redundancy, but that will
> get them up fast until a fix is found.
Wouldn't spanning tree do the same thing, but with automated recovery
in the case of a failure of the live link?
> Robert LeBlanc
> Life Sciences & Undergraduate Education Computer Support
> Brigham Young University
> On Wed, Jan 20, 2010 at 2:29 PM, Brad Hudson <hudson at pythian.com> wrote:
>> I looked over your site and could not find the document you reference in
>> your links. Can you provide the url to get to it?
>> I'll be happy to pass along anything I find and would appreciate it if
>> you would do the same. As the client having the issue is using it for
>> production we may need to move to a non-bridged variety of transparent
>> firewall with proxy_arp to get them back up quickly. Ideally I would
>> like to avoid that, but it's prod and needs to work.
>> Robert LeBlanc wrote:
>> > On Wed, Jan 20, 2010 at 1:04 PM, Brad Hudson <hudson at pythian.com
>> > <mailto:hudson at pythian.com>> wrote:
>> > Hi all;
>> > I have an odd problem that I have been dealing with for a week. I
>> > was
>> > hoping someone could help, or point me in the right direction for
>> > clues.
>> > I have a standard bridge setup. br0 is composed of eth0 and eth1.
>> > # brctl show bro
>> > bridge name bridge id STP enabled interfaces
>> > br0 8000.000c292280b9 no eth0
>> > eth1
>> > Eth0 and eth1 both have 0.0.0.0 (no) address assigned and are up.
>> > br0
>> > is assigned the proper IP and the routing table is correct. STP is
>> > off.
>> > I have been losing connectivity to hosts inside the local segment of
>> > the
>> > bridge. Some investigation has revealed that the problem is related
>> > to
>> > arp not working correctly. Arp packets going this way
>> > eth1->br0->eth0->network/internet
>> > have no problems at all. The replies coming back the other way all
>> > get
>> > to br0, but only 33% (approx, it varies) make it to the eth1 side of
>> > the
>> > bridge. I have verified this traffic pattern by tcpdump of arp
>> > packets
>> > through each of these devices while doing an nmap -sP of the /24
>> > network
>> > to generate both arp and icmp. We are not able to arp any host
>> > outside
>> > our local segment, including the default gateway (which is owned by
>> > the
>> > co-lo). nmapping from the bridging server itself from interface br0
>> > gets the correct number of arp replies.
>> > ebtables and arp_tables are not running, and adding them in has had
>> > no
>> > change in result. There was a server with 2 NICs, each with an IP
>> > on
>> > the same subnet, that was causing some MAC flapping but that has
>> > been
>> > fixed and no change to the described behaviour. All items in
>> > /proc/sys/net/bridge are set to '1', but setting them to '0' has no
>> > effect. The server hosting the bridge has been rebooted several
>> > times
>> > with no effect. proxy_arp does not help at all. I also tried
>> > parprouted with no success.
>> > A couple other notes.
>> > - This behaviour suddenly appeared about a week ago. I think this
>> > is
>> > probably related to an increase in network traffic but it's hard to
>> > say,
>> > the client does not buy into that statement. If it was a matter of
>> > 0
>> > work or all work then there's places to look for that, but in this
>> > case
>> > the problem is intermittent and the lost arp replies are not the
>> > same
>> > every time.
>> > - In another test we found that if we ping the inside server from
>> > the
>> > firewall and also from an external machine the connectivity to the
>> > inside server dies. Once the pings are stopped, the connectivity
>> > eventually returns. If I ping out from the inside server while
>> > doing
>> > that test, the session keeps going through without hanging.
>> > - The firewall is a Vm running under ESX. The vmxnet driver has
>> > been
>> > reinstalled and the pcnet32 driver is not loaded. Both NICs are
>> > virtual
>> > so there is no chance of failed hardware, though I suppose the
>> > problem
>> > could be on the ESX layer. I have made some attempt to diagnose the
>> > WSX
>> > layer but nothing jumps out at me.
>> > I have been watching tcpdumps and do not see any sign of frags,
>> > dupes,
>> > or anything that would cause lost packets. I have combed the
>> > newsgroups, google and even irc looking for clues or similar
>> > situations,
>> > but nothing I have found fits the profile.
>> > The workaround we currently have in place is to make a static arp
>> > entry
>> > for the gateway on all servers on the inside. This is not ideal
>> > because
>> > the co-lo controls the router and it could fail over to another
>> > device
>> > which would kill our route again.
>> > Can anyone suggest anyplace I can look for clues, settings I should
>> > check or other? I am out of ideas at this point.
>> > Your help is very much appreciated.
>> > Regards;
>> > Brad
>> > --
>> > Brad Hudson
>> > SA Team Lead
>> > The Pythian Group - love your data
>> > Desk: 613-565-8696 x202
>> > IM: pythianhudson
>> > I assume you have multiple physical NICs connected to your virtual
>> > switch. If so I've posted my finding on my web page
>> > http://robert.leblancnet.us and I've posted a message to this form two
>> > days ago entitled "Need help writing ebtables rules". I'm not sure my
>> > messages are getting through as I've sent a few messages with no one
>> > responding. If we can work together to solve the problem, we can both
>> > benefit.
>> > Thanks,
>> > Robert LeBlanc
>> > Life Sciences & Undergraduate Education Computer Support
>> > Brigham Young University
>> Brad Hudson
>> SA Team Lead
>> The Pythian Group - love your data
>> Desk: 613-565-8696 x202
>> IM: pythianhudson
> Bridge mailing list
> Bridge at lists.linux-foundation.org
More information about the Bridge