L3 network isolation

Daniel Lezcano dlezcano at fr.ibm.com
Wed Dec 6 15:25:45 PST 2006


Hi all,

Dmitry and I, we thought about a possible implementation allowing the 
l2/l3 to coexists.

The idea is assuming the l3 network namespaces are the leaf in the l2 
namespace hierarchy tree. By default, init process is l2 namespace. From 
a layer 3, it is impossible to do a new network namespace unshare.

All the configuration is done into the l2 namespace. When a l3 is 
created a new IP address should be created into the l2 namespace and 
"pushed" into the l3. When the l3 dies, the IP is pulled to its parent, 
aka the l2. In order to ensure security into the l3, the NET_ADMIN 
capability is lost when doing unsharing for l3.
There is no extra code for socket virtualization. It is a common part.

How to setup a l3 namespace ?
-----------------------------

  1 - setup a new IP address in l2 namespace
  2 - create a l3 namespace
  3 - specific socket ioctl to "push" the IP address from the l2 
namespace to the newly created l3 namespace

The l2 lose visibility on the IP address and l3 gains visibility on the 
IP address. A ifconfig or a ip command shows only the IP address 
assigned to the namespace. Loopback address is always visible.

How to handle outgoing traffic ?
--------------------------------

The bind must be checked with the IP addresses belonging to the l3 
namespace and with all the derivative addresses (multicast, broadcast, 
zero net, loopback, ...).

The IP addresses will rely on aliased IP address. The source address 
must be filled with the IP address belonging the l3 namespace when not 
set. This is a trivial operation, because we know which IP addresses are 
assigned to the l3 namespace.

When the route are resolved, the l3 namespace switch the its parent, 
that is to say the l2 namespace, and the virtualization follows its 
normal path.

How to handle incoming traffic ?
--------------------------------

Because we can have several sockets listening on the same 
INADDR_ANY:port, we must find the network namespace associated with the 
destination IP address.
For unicast, this is a trivial operation, because that can be checked 
with the assigned IP address again. For broadcast and multicast, some 
extra work should be done in order to store the namespaces which are 
listening on a broadcast address. As soon as the namespace is found, we 
switch to it. This can be done with netfilters.

Routes and co.
--------------

  - Routes: they are not isolated, each l3 namespace can see all the 
routes from the other namespaces. That allows the routing engine to see 
all the routes and choose the loopback when two network namespaces in 
the same host try to communicate.

  - Cache: the routing cache must be isolated, otherwise the socket 
isolation will not work. The l3 namespace code does not impact the l2 
namespace code and route cache isolation is a common part if the l3 
namespace switching is done in the right place.


Dmitry has posted the l2 namespace relying on the net namespace empty 
framework, I will post the l3 namespace relying on the l2 namespace 
today or tomorrow.

   -- Daniel




More information about the Containers mailing list