RT Scheduler and Network Namespaces - possible issue

Florin Medrea (Gmail) florinmedrea at gmail.com
Tue Oct 28 17:18:27 UTC 2014

2014-10-28 17:42 GMT+01:00 Peter Zijlstra <peterz at infradead.org>:

> On Tue, Oct 28, 2014 at 04:13:42PM +0100, Florin Medrea (Gmail) wrote:
> > Hello all,
> >
> > I have a doubt about a certain issue I encounter on my embedded Linux
> > Kernel.
> >
> > *Use case*: Use Network Namespaces on RT Processes
> > *Environment*: 2.6.35 Linux Kernel + some patches for netns (
> > https://github.com/unicell/redpatch/commits/rhel-2.6.32-358.6.2.ns.el6)
> > *Configuration*: the *CONFIG_RT_GROUP_SCHED* option is activated
> At this point I was about ready to ignore the rest, 2.6.35 is just too
> ancient to spend more time on.

I agree with you. However, I have some custom patches applied to this
kernel version that I can not yet upgrade due to HW

> > In my user space application (RT priority) I attemp to *unshare* to a new
> > Network Namespace. This fais with *EINVAL*. By debugging with printks in
> > the kernel scheduler, I found that the *unshare* request is refused at
> this
> > point:
> http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L8392
> > (because *rt_bandwidth.rt_runtime* is 0).
> Have you co-mounted stuff? why would network namespaces have anything to
> do with cpu cgroups?

I use the unshare system call, which is linked to cgroups:

> > Digging more in the trace calls, I see that the bandwidth is initialised
> > here: http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L7932
> > and remains set to 0 during the *can_attach* check. Can someone explain
> why
> > the bandwidth is initialised to 0 runtime, whilst initialised to
> > *global_rt_runtime()* at other places in *sched.c* (
> > http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L7533)?
> Sure, the rule is that the sum of all child cgroup's utilization must
> not be more than that of the parent cgroup's, and the root cgroup must
> not have more than is maximally available -- the second line you cite is
> the root cgroup, its special as per the above.
> [where the utilization is runtime/period]
> This means that there is no possible right value to initialize child
> cgroups to, any value !0 might be more than the parent cgroup has,
> also, any !0 value will lead to having to fail creating more cgroups
> because the sum of children will exceed their parent's.

> Furthermore any random !0 value will be wrong for your workload, only
> the administrator that knows the workload can set a meaningful period and
> runtime. The kernel cannot possibly know this, and therefore doesn't
> attempt.
I see the point and agree with it. In my case, the root utilisation of
runtime is set to 950000 (according to /cgroup/cpu.rt_runtime_us) and the
child's is set to -1 (infinite of it's parent?). Initialising this value to
0 denies to disassociate to a new netns (checked immediately after
initialisation). Shouldn't the kernel be more permissive when checking the
runtime here (
http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L8395) then?
If not, is there a way to specify the workload for the child group?

Thanks for your reply!

More information about the Containers mailing list