[Patch net-next] net: make neigh tables per netns

Stéphane Graber stgraber at ubuntu.com
Tue Nov 4 15:49:58 UTC 2014


On Mon, Jun 30, 2014 at 08:54:34PM +0200, Hannes Frederic Sowa wrote:
> Hi,
> 
> On Mon, Jun 30, 2014, at 20:15, Jesper Dangaard Brouer wrote:
> > 
> > On Fri, 27 Jun 2014 22:12:52 -0700 ebiederm at xmission.com (Eric W.
> > Biederman) wrote:
> > > Cong Wang <xiyou.wangcong at gmail.com> writes:
> > > > On Thu, Jun 26, 2014 at 3:44 PM, David Miller <davem at davemloft.net> wrote:
> > > >>
> > [...]
> > > >
> > > > Hmm, I did overlook the potential DOS problem. But hold on, isn't
> > > > IP fragments have the same problem? The fragment queues are per
> > > > netns, and the thresh is per netns as well, we will eventually have
> > > > memory pressure as well.
> > > 
> > > Interesting.  It does look like ip fragments are susceptible that way.
> > 
> > For IP fragments we have per netns mem-limit and LRU-list, but all
> > netns share the same hash table, which have its own DoS potential.
> > 
> > And argh! - we have a hardcoded INETFRAGS_MAXDEPTH=128, which can be
> > used for (slow) DoS of IP frags if enough netns are created.
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/net/ipv4/inet_fragment.c#n344
> > 
> > Introduced by commit 5a3da1fe9 ("inet: limit length of fragment queue
> > hash table bucket lists").
> 
> Sure, but we need that, otherwise even a single netns can get exploited
> up to a remotely triggered lockup of the box - e.g.
> https://gist.github.com/hannes/5116331 - on some smaller machines.
> INETFRAGS_MAXDEPTH is a property of the hashtable and walking a chain
> with more than 128 elements is just crazy.
> 
> Also, for me making this user configurable doesn't seem to provide a
> benefit.
> 
> Sure, it does introduce some kind of unfairness between the namespaces,
> but so does all code which overcommits shared resources.
> 
> Bye,
> Hannes

Hello,

As a way to test this issue and show how easy it is to DoS a machine by
filling the IPv6 neighborhood table, I've written this small example:
https://dl.stgraber.org/ipv6-dos.c

This can be run as a nobody user on any kernel with user namespaces enabled.
What it does is unshare a new user namespace and then a new network
namespace inside it. It then creates a veth pair, assigns 4000 IPv6
addresses on the first interface of the pair, then forks, unshares
another network namespace, moves the second interface of the pair in
there and assigns another 4000 IPv6 addresses.

At that point, you have two interfaces, one in the first network
namespace the second in the other network namespace, each with 4000 IPv6
addresses. This tool will then start a simple TCP server in one of the
namespace and in the other, open 4000 connections, each using a
different source and destination address.

The result is 4000 open connections, in theory requiring 8000 IPv6
neighborhood table entries.


Once the tool is done attempting to open that many connections, any
attempt to connect to a host in a directly connected IPv6 subnet
(so requiring a new neighborhood table entry) will fail with EINVAL.



While the global limit can indeed be bumped, so can the number of
connections established by this tool. I don't believe a global limit
influence by the number of namespaces would help here either since
whatever the resulting global limit ends up being, the tool can be
changed to establish $global_limit+1 connections.

I'm mostly a userspace guy and don't really know the details of the
kernel implementation, but considering that device creation and adding
addresses is now possible by any unprivileged user, having the limit of
neighborhood entries be per-interface rather than global would make
sense to me.


Hopefully this helped clarifiy the problem we've been seeing lately.

-- 
Stéphane Graber
Ubuntu developer
http://www.canonical.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.linuxfoundation.org/pipermail/containers/attachments/20141104/274e7fe6/attachment-0001.sig>


More information about the Containers mailing list