RT Scheduler and Network Namespaces - possible issue

Peter Zijlstra peterz at infradead.org
Tue Oct 28 16:42:38 UTC 2014


On Tue, Oct 28, 2014 at 04:13:42PM +0100, Florin Medrea (Gmail) wrote:
> Hello all,
> 
> I have a doubt about a certain issue I encounter on my embedded Linux
> Kernel.
> 
> *Use case*: Use Network Namespaces on RT Processes
> *Environment*: 2.6.35 Linux Kernel + some patches for netns (
> https://github.com/unicell/redpatch/commits/rhel-2.6.32-358.6.2.ns.el6)
> *Configuration*: the *CONFIG_RT_GROUP_SCHED* option is activated

At this point I was about ready to ignore the rest, 2.6.35 is just too
ancient to spend more time on.

> In my user space application (RT priority) I attemp to *unshare* to a new
> Network Namespace. This fais with *EINVAL*. By debugging with printks in
> the kernel scheduler, I found that the *unshare* request is refused at this
> point: http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L8392
> (because *rt_bandwidth.rt_runtime* is 0).

Have you co-mounted stuff? why would network namespaces have anything to
do with cpu cgroups?

> Digging more in the trace calls, I see that the bandwidth is initialised
> here: http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L7932
> and remains set to 0 during the *can_attach* check. Can someone explain why
> the bandwidth is initialised to 0 runtime, whilst initialised to
> *global_rt_runtime()* at other places in *sched.c* (
> http://lxr.free-electrons.com/source/kernel/sched.c?v=2.6.35#L7533)?

Sure, the rule is that the sum of all child cgroup's utilization must
not be more than that of the parent cgroup's, and the root cgroup must
not have more than is maximally available -- the second line you cite is
the root cgroup, its special as per the above.

[where the utilization is runtime/period]

This means that there is no possible right value to initialize child
cgroups to, any value !0 might be more than the parent cgroup has,
also, any !0 value will lead to having to fail creating more cgroups
because the sum of children will exceed their parent's.

Furthermore any random !0 value will be wrong for your workload, only
the administrator that knows the workload can set a meaningful period and
runtime. The kernel cannot possibly know this, and therefore doesn't
attempt.




More information about the Containers mailing list