Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]

Frederic Weisbecker fweisbec at gmail.com
Wed Apr 30 13:28:49 UTC 2014


On Wed, Apr 30, 2014 at 09:12:20AM -0400, Daniel J Walsh wrote:
> 
> On 04/29/2014 05:44 PM, Frederic Weisbecker wrote:
> > On Tue, Apr 29, 2014 at 09:59:30AM -0700, Tim Hockin wrote:
> >> Here's the reason it doesn't work for us: It doesn't work.  It was
> >> something like 2 YEARS since we first wanted this, and it STILL does
> >> not work.
> > When I was working on the task counter cgroup subsystem 2 years
> > ago, the patches were actually pushed back by google people, in favour
> > of task stack kmem cgroup subsystem.
> >
> > The reason was that expressing the forkbomb issue in terms of
> > number of tasks as a resource is awkward and that the real resource
> > in the game comes from kernel memory exhaustion due to task stack being
> > allocated over and over, swap ping-pong and stuffs...
> >
> > And that was a pretty good argument. I still agree with that. Especially
> > since that could solve others people issues at the same time. kmem
> > cgroup has a quite large domain of application.
> >
> >> You're postponing a pretty simple request indefinitely in
> >> favor of a much more complex feature, which still doesn't really give
> >> me what I want.  What I want is an API that works like rlimit but
> >> per-cgroup, rather than per-UID.
> > The request is simple but I don't think that adding the task counter
> > cgroup subsystem is simpler than extending the kmem code to apply limits
> > to only task stack. Especially in terms of maintainance.
> >
> > Also you guys have very good mm kernel developers who are already
> > familiar with this.
> I would look at this from a Usability point of view.  It is a lot easier
> to understand number of processes then the mount of KMEM those processes
> will need.  Setting something like
> ProcessLimit=1000 in a systemd unit file is easy to explain.

Yeah that's a fair point.

> Now if systemd has the ability to translate this into something that makes
> sense in terms of kmem cgroup, then my argument goes away.

Yeah if we keep the kmem direction, this can be a place where we do the mapping.
Now I just hope the amount of stack memory allocated doesn't differ too much per arch.


More information about the Containers mailing list