[RFC] [PATCH] Cgroup based OOM killer controller

David Rientjes rientjes at google.com
Tue Jan 27 12:37:21 PST 2009


On Tue, 27 Jan 2009, Evgeniy Polyakov wrote:

> > There is no additional oom killer limitation imposed here, nor can the oom 
> > killer kill a task hung in D state any better than userspace.
> 
> Well, oom-killer can, since it drops unkillable state from the process
> mask, that may be not enough though, but it tries more than userspace.
> 

The only thing it does is send a SIGKILL and gives the thread access to 
memory reserves with TIF_MEMDIE, it doesn't drop any unkillable state.  If 
its victim is hung in D state and the memory reserves do not allow it to 
return to being runnable, this task will not die and the oom killer would 
livelock unless given another target.

> My main point was to haev a way to monitor memory usage and that any
> process could tune own behaviour according to that information. Which is
> not realated to the system oom-killer at all. Thus /dev/mem_notify is
> interested first (and only the first) as a memory usage notification
> interface and not a way to invoke any kind of 'soft' oom-killer.

It's a way to prevent invoking the kernel oom killer by allowing userspace 
notification of events where methods such as droping caches, elevating 
limits, adding nodes, sending signals, etc, can prevent such a problem.  
When the system (or cgroup) is completely oom, it can also issue SIGKILLs 
that will free some memory and preempt the oom killer from acting.

I think there might be some confusion about my proposal for extending 
/dev/mem_notify.  Not only should it notify of certain low memory events, 
but it should also allow userspace notification of oom events, just like 
the cgroup oom notifier patch allowed.  Instead of attaching a task to a 
cgroup file in that case, however, this would simply be the responsibility 
of a task that has set up a poll() on the cgroup's mem_notify file.  A 
configurable delay could be imposed so page allocation attempts simply 
loop while the userspace handler responds and then only invoke the oom 
killer when absolutely necessary.

> Application can do whatever it wants of course including killing itself
> or the neighbours, but this should not be forced as a usage policy.
> 

If preference killing is your goal, then userspace can do it with the 
/dev/mem_notify functionality.


More information about the Containers mailing list