`/sys/fs/cgroup/<subsystem>/tasks` file is empty when setns to another cgroup namespace
Yuanhong Peng
pengyuanhong at huawei.com
Wed Feb 8 06:21:52 UTC 2017
Hi,
I am working on adding support for cgroup namespace on docker recently,
since `setns()` for cgroup namespace no longer requires the process
to first move under the target cgroupns-root, I accidentally found that
if two processes share cgroup namespace (the first process’s cgroup-root
is `/p1` while the second’s is `../p2`) but join different pid
namespaces, then the second process’s `/sys/fs/cgroup/<subsystem>/tasks`
file would be empty.
Here is an example:
In session 1:
# mkdir -p /sys/fs/cgroup/freezer/p1
# echo $$
110413
# echo 110413 > /sys/fs/cgroup/freezer/p1/tasks
# cat /proc/self/cgroup | grep freezer
7:freezer:/p1
Next, we use `unshare` to create a process running a new shell
in new cgroup, pid and mount namespaces:
# unshare –C –m –p bash
# cat /proc/self/cgroup | grep freezer
7:freezer:/
# cat /proc/self/mountinfo | grep freezer
308 301 0:31 /.. /sys/fs/cgroup/freezer rw,relatime - cgroup
cgroup rw,freezer
Now, we remount the freezer cgroup filesystem inside this
cgroup namespace:
# mount --make-rslave /
# umount /sys/fs/cgroup/freezer
# mount -t cgroup -o freezer freezer /sys/fs/cgroup/freezer
# cat /proc/self/mountinfo | grep freezer
308 301 0:31 / /sys/fs/cgroup/freezer rw,relatime - cgroup
freezer rw,freezer
# cat /sys/fs/cgroup/freezer/tasks
1
371
In session 2:
# mkdir -p /sys/fs/cgroup/freezer/p2
# echo $$
110613
# echo 110613 > /sys/fs/cgroup/freezer/p2/tasks
# cat /proc/self/cgroup | grep freezer
7:freezer:/p2
Next, we run the program shown below, using it to execute a
shell in new pid and mount namespaces, but shares cgroup
namespace with the above new bash progress:
# ./test
# cat /proc/self/cgroup | grep freezer
7:freezer:/../p2
# cat /proc/self/mountinfo | grep freezer
360 353 0:31 /.. /sys/fs/cgroup/freezer rw,relatime - cgroup
cgroup rw,freezer
Also, we remount the freezer cgroup filesystem inside this
cgroup namespace:
# mount --make-rslave /
# umount /sys/fs/cgroup/freezer
# mount -t cgroup -o freezer freezer /sys/fs/cgroup/freezer
# cat /proc/self/mountinfo | grep freezer
360 353 0:31 / /sys/fs/cgroup/freezer rw,relatime - cgroup
freezer rw,freezer
# ls /sys/fs/cgroup/freezer/
cgroup.clone_children cgroup.procs freezer.parent_freezing
freezer.self_freezing freezer.state notify_on_release tasks
# cat /sys/fs/cgroup/freezer/tasks
# (nothing)
I have also tried to let the two processes share pid namespace, then the
second new bash process’s `/sys/fs/cgroup/freezer/tasks` file would be
the same as the first one. Moreover, if I move the second process under
the first process’s cgroupns-root(i.e `/p1/p2`), then its `tasks` file
will contain expected pids:
# mkdir -p /sys/fs/cgroup/freezer/p1/p2
# echo $$
110766
# echo 110766 > /sys/fs/cgroup/freezer/p1/p2/tasks
# cat /proc/self/cgroup
7:freezer:/p1/p2
# ./test
# cat /proc/self/cgroup | grep freezer
7:freezer:/p2
# cat /proc/self/mountinfo | grep freezer
360 353 0:31 /.. /sys/fs/cgroup/freezer rw,relatime - cgroup
cgroup rw,freezer
# mount --make-rslave /
# umount /sys/fs/cgroup/freezer
# mount -t cgroup -o freezer freezer /sys/fs/cgroup/freezer
# ls /sys/fs/cgroup/freezer/
cgroup.clone_children cgroup.procs freezer.parent_freezing
freezer.self_freezing freezer.state notify_on_release p2
tasks
# cat /sys/fs/cgroup/freezer/p2/tasks
1
274
In a word, the conclusion is that if a process uses `setns()` to join
another process’s cgroup namespace without being moved to the target
cgroupns-root, then after we remount cgroupfs inside the cgroup
namespace, the shared process’s `/sys/fs/cgroup/<subsystem>/tasks`
file would be the same as the target process’s if the two process are in
the same pid namespace, while this file would be empty if the two
processes are in different pid namespaces (probably it’s because that
the shared process cannot see the pid of the target process).
So, is it an intended behavior or a bug? Or, if there is anything wrong
with my operations above? Then, how can I mount cgroupfs correctly
inside a shared cgroup namespace?
Thanks in advance if anyone could help : )
Program source:
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/types.h>
#include <linux/sched.h>
#include <sys/mman.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <sys/wait.h>
#include <fcntl.h>
#define STACK_SIZE 1024*1024*8 //8M
int thread_func(void *lparam)
{
//120719 is the new bash process’s pid in session 1
int fd = open("/proc/120719/ns/cgroup", O_RDONLY);
if (fd == -1)
return 1;
if (setns(fd, CLONE_NEWCGROUP) == -1)
return 1;
execl("/bin/bash", "bash", NULL);
return 0;
}
int main(int argc, char **argv)
{
void *pstack = (void *)mmap(NULL,
STACK_SIZE,
PROT_READ | PROT_WRITE ,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_ANON ,
-1,
0);
if (MAP_FAILED != pstack)
{
int ret;
ret = clone(thread_func,
(void *)((unsigned char *)pstack + STACK_SIZE),
CLONE_NEWNS | CLONE_NEWPID,
(void *)NULL);
if (-1 != ret)
{
pid_t pid = 0;
sleep(5);
pid = waitpid(-1, NULL, __WCLONE | __WALL);
printf("child : %d exit %s\n", pid,strerror(errno));
}
else
{
printf("clone failed %s\n", strerror(errno) );
}
}
else
{
printf("mmap() failed %s\n", strerror(errno));
}
return 0;
}
---
Yuanhong Peng
More information about the Containers
mailing list