[RFC] [PATCH] Cgroup based OOM killer controller

Evgeniy Polyakov zbr at ioremap.net
Tue Jan 27 13:51:18 PST 2009


On Tue, Jan 27, 2009 at 12:37:21PM -0800, David Rientjes (rientjes at google.com) wrote:
> > Well, oom-killer can, since it drops unkillable state from the process
> > mask, that may be not enough though, but it tries more than userspace.
> > 
> 
> The only thing it does is send a SIGKILL and gives the thread access to 
> memory reserves with TIF_MEMDIE, it doesn't drop any unkillable state.  If 

There is a small difference between force_sig_info() and usual
send_sinal() used by kill.

> its victim is hung in D state and the memory reserves do not allow it to 
> return to being runnable, this task will not die and the oom killer would 
> livelock unless given another target.

D-states are different. In the current tree we even have
page_lock_killable(), so it depends.

> > My main point was to haev a way to monitor memory usage and that any
> > process could tune own behaviour according to that information. Which is
> > not realated to the system oom-killer at all. Thus /dev/mem_notify is
> > interested first (and only the first) as a memory usage notification
> > interface and not a way to invoke any kind of 'soft' oom-killer.
> 
> It's a way to prevent invoking the kernel oom killer by allowing userspace 
> notification of events where methods such as droping caches, elevating 
> limits, adding nodes, sending signals, etc, can prevent such a problem.  
> When the system (or cgroup) is completely oom, it can also issue SIGKILLs 
> that will free some memory and preempt the oom killer from acting.
> 
> I think there might be some confusion about my proposal for extending 
> /dev/mem_notify.  Not only should it notify of certain low memory events, 
> but it should also allow userspace notification of oom events, just like 
> the cgroup oom notifier patch allowed.  Instead of attaching a task to a 
> cgroup file in that case, however, this would simply be the responsibility 
> of a task that has set up a poll() on the cgroup's mem_notify file.  A 
> configurable delay could be imposed so page allocation attempts simply 
> loop while the userspace handler responds and then only invoke the oom 
> killer when absolutely necessary.

I have really no objections against this and extending oom-killer to
allow to wait a bit in the allocation path before userspace makes some
progress. But do not drop existing oom-killer (i.e. its ability to kill
processes) in favour of this new feature. Let's have both and if
extension failed for some reason, old oom-killer will do the things.

-- 
	Evgeniy Polyakov


More information about the Containers mailing list