[Bridge] Incoming packets not always traversing the bridge

richardvoigt at gmail.com richardvoigt at gmail.com
Wed Jan 20 14:03:29 PST 2010


On Wed, Jan 20, 2010 at 3:45 PM, Robert LeBlanc <robert at leblancnet.us> wrote:
> Another quick fix is to remove all physical NICs except for one from the
> Virtual switch(s) that are being bridged. You lose redundancy, but that will
> get them up fast until a fix is found.

Wouldn't spanning tree do the same thing, but with automated recovery
in the case of a failure of the live link?

>
> Robert LeBlanc
> Life Sciences & Undergraduate Education Computer Support
> Brigham Young University
>
>
> On Wed, Jan 20, 2010 at 2:29 PM, Brad Hudson <hudson at pythian.com> wrote:
>>
>> Robert;
>>
>> I looked over your site and could not find the document you reference in
>> your links.  Can you provide the url to get to it?
>>
>> I'll be happy to pass along anything I find and would appreciate it if
>> you would do the same.  As the client having the issue is using it for
>> production we may need to move to a non-bridged variety of transparent
>> firewall with proxy_arp to get them back up quickly.  Ideally I would
>> like to avoid that, but it's prod and needs to work.
>>
>> Regards;
>>
>> Brad
>>
>> Robert LeBlanc wrote:
>> > On Wed, Jan 20, 2010 at 1:04 PM, Brad Hudson <hudson at pythian.com
>> > <mailto:hudson at pythian.com>> wrote:
>> >
>> >     Hi all;
>> >
>> >     I have an odd problem that I have been dealing with for a week.  I
>> > was
>> >     hoping someone could help, or point me in the right direction for
>> > clues.
>> >
>> >     I have a standard bridge setup.  br0 is composed of eth0 and eth1.
>> >
>> >     # brctl show bro
>> >     bridge name     bridge id               STP enabled     interfaces
>> >     br0             8000.000c292280b9       no              eth0
>> >                                                            eth1
>> >
>> >     Eth0 and eth1 both have 0.0.0.0 (no) address assigned and are up.
>> >  br0
>> >     is assigned the proper IP and the routing table is correct.  STP is
>> > off.
>> >
>> >     I have been losing connectivity to hosts inside the local segment of
>> > the
>> >     bridge.  Some investigation has revealed that the problem is related
>> > to
>> >     arp not working correctly.  Arp packets going this way
>> >
>> >     eth1->br0->eth0->network/internet
>> >
>> >     have no problems at all.  The replies coming back the other way all
>> > get
>> >     to br0, but only 33% (approx, it varies) make it to the eth1 side of
>> > the
>> >     bridge.  I have verified this traffic pattern by tcpdump of arp
>> > packets
>> >     through each of these devices while doing an nmap -sP of the /24
>> > network
>> >     to generate both arp and icmp.  We are not able to arp any host
>> > outside
>> >     our local segment, including the default gateway (which is owned by
>> > the
>> >     co-lo).  nmapping from the bridging server itself from interface br0
>> >     gets the correct number of arp replies.
>> >
>> >     ebtables and arp_tables are not running, and adding them in has had
>> > no
>> >     change in result.  There was a server with 2 NICs, each with an IP
>> > on
>> >     the same subnet, that was causing some MAC flapping but that has
>> > been
>> >     fixed and no change to the described behaviour.  All items in
>> >     /proc/sys/net/bridge are set to '1', but setting them to '0' has no
>> >     effect.  The server hosting the bridge has been rebooted several
>> > times
>> >     with no effect.  proxy_arp does not help at all.  I also tried
>> >     parprouted with no success.
>> >
>> >     A couple other notes.
>> >
>> >     - This behaviour suddenly appeared about a week ago.  I think this
>> > is
>> >     probably related to an increase in network traffic but it's hard to
>> > say,
>> >     the client does not buy into that statement.  If it was a matter of
>> > 0
>> >     work or all work then there's places to look for that, but in this
>> > case
>> >     the problem is intermittent and the lost arp replies are not the
>> > same
>> >     every time.
>> >     - In another test we found that if we ping the inside server from
>> > the
>> >     firewall and also from an external machine the connectivity to the
>> >     inside server dies.  Once the pings are stopped, the connectivity
>> >     eventually returns.  If I ping out from the inside server while
>> > doing
>> >     that test, the session keeps going through without hanging.
>> >     - The firewall is a Vm running under ESX.  The vmxnet driver has
>> > been
>> >     reinstalled and the pcnet32 driver is not loaded.  Both NICs are
>> > virtual
>> >     so there is no chance of failed hardware, though I suppose the
>> > problem
>> >     could be on the ESX layer.  I have made some attempt to diagnose the
>> > WSX
>> >     layer but nothing jumps out at me.
>> >
>> >     I have been watching tcpdumps and do not see any sign of frags,
>> > dupes,
>> >     or anything that would cause lost packets.  I have combed the
>> >     newsgroups, google and even irc looking for clues or similar
>> > situations,
>> >     but nothing I have found fits the profile.
>> >
>> >     The workaround we currently have in place is to make a static arp
>> > entry
>> >     for the gateway on all servers on the inside.  This is not ideal
>> > because
>> >     the co-lo controls the router and it could fail over to another
>> > device
>> >     which would kill our route again.
>> >
>> >     Can anyone suggest anyplace I can look for clues, settings I should
>> >     check or other?  I am out of ideas at this point.
>> >
>> >     Your help is very much appreciated.
>> >
>> >     Regards;
>> >
>> >     Brad
>> >
>> >
>> >
>> >     --
>> >     Brad Hudson
>> >     SA Team Lead
>> >     The Pythian Group - love your data
>> >     Desk: 613-565-8696 x202
>> >     IM: pythianhudson
>> >
>> >
>> > I assume you have multiple physical NICs connected to your virtual
>> > switch. If so I've posted my finding on my web page
>> > http://robert.leblancnet.us and I've posted a message to this form two
>> > days ago entitled "Need help writing ebtables rules". I'm not sure my
>> > messages are getting through as I've sent a few messages with no one
>> > responding. If we can work together to solve the problem, we can both
>> > benefit.
>> >
>> > Thanks,
>> >
>> > Robert LeBlanc
>> > Life Sciences & Undergraduate Education Computer Support
>> > Brigham Young University
>> >
>>
>>
>>
>> --
>> Brad Hudson
>> SA Team Lead
>> The Pythian Group - love your data
>> Desk: 613-565-8696 x202
>> IM: pythianhudson
>
>
> _______________________________________________
> Bridge mailing list
> Bridge at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/bridge
>


More information about the Bridge mailing list