Pavel Emelyanov xemul at
Fri Aug 10 14:55:36 UTC 2012

Hi, Eric!

There's an issue with setns versus unshare syscall which I consider
to be worth looking at. Look -- when you open some task's namespace file,
e.g. /proc/<pid>/ns/net, the net namespace is cached on the proc inode.

If later the task with the pid <pid> unshares the namespace in question
(in this case -- net ns) the subsequent openings of this task's proc ns
file will result in old namespace obtained and the setns call will not
work as expected. Here's a simple proggie which demonstrates this:

int main(void)
	int pid, fd;
	char path[64];

	pid = fork();
	if (!pid) {
		fd = open("/proc/self/ns/net", O_RDONLY);
		printf("New net:\n");
		system("ip l");
	} else {
		printf("Old net:\n");
		system("ip l");
		sprintf(path, "/proc/%d/ns/net", pid);
		fd = open(path, O_RDONLY);
		set_ns(fd, CLONE_NEWNET);
		printf("New net 2:\n");
		system("ip l");

	return 0;

The "else" branch after set_ns expects the net it set to be the new one (and
contain a lo device only), but it's not so -- after the setns syscall the net
namespace isn't changed! If you comment out the "if" branch's open and close
calls (thus avoiding the ns caching) the setns works as expected.

I assume you're aware of this problem, so do you have plans to fix this?


