[Libcg-devel] Control groups and Resource Management notes (part II)
vgoyal at redhat.com
Tue Aug 5 06:30:07 PDT 2008
On Tue, Aug 05, 2008 at 04:45:30PM +0900, KOSAKI Motohiro wrote:
> Hi balbir-san,
> Thank you for nice minutes.
> it is very helpful for non invited people (include me).
> > 10. Freezer subsystem - The freezer system was discussed briefly. Serge
> > mentioned the patches and wanted to collect feedback (if any) on them.
> Who use it?
> AFAIK the freezer is used by HPC guys in general.
> but they think MPI process must be freezed.
> Unfortunately, Opensource MPI implementation use various inter-process
> communication method (e.g. SYSV IPC, socket, ptrace)
> then, general freezer implementaion is very difficult.
> > 11. OOM Handler - The OOM handler was discussed in detail. Balbir mentioned
> > certain short comings of the OOM handler
> > a. Logic - it is based on total_vm, is that the correct metric for
> > OOMing?
> > b. Concurrency - it kills several tasks at once
> > There was a discussion on moving the policy for OOM handling to user space. Paul
> > described how the OOM handler has been modified at google to notify user space
> > when a CPUSet runs out of memory. Balbir asked if OOMing on reaching limits is a
> > good idea, it was generally discussed that it might not be such a good idea.
> CPUSET based limitation is not easy to use (slightly).
> memcgroup based is better.
> In addition, notification on reaching limit can be very generic.
> various limit (e.g. cpu time, memory usage), various notification
> (e.g. kill process, send signal, inotify), various target
> (each process on the cgroup or manager process) can be tought.
> > Control group library
> > =====================
> > Dhaval and Balbir introduced libcgroups and the purpose of the library and the
> > goals. Balbir described on paper what the current design looks like, it consists of
> > 1. API
> > 2. Test framework
> > 3. A configuration subsystem
> > Dhaval discussed configuration syntax of XML versus home made. The issue of
> > classification of tasks was brought up. The reason that we want to classify
> > tasks is that we want them to move at fork/exec time to the correct cgroup so that
> I don't recommend XML, because XML is tree based syntax but we want more fulexible
> classification. then I guess XML reduce human readability.
> > 1. They don't consume resources in the parents group
> > 2. The movement is automatic
> > It was generally agreed upon that the classification should take place in user
> > space. Eric and others suggested having a wrapper to start the application in
> > the correct cgroup (wrapper around fork/exec). Dave suggested that one might
> > even go the extent of hacking, such that a process is ptraced after fork/exec,
> > moved to the correct group and resumed. Using SELinux contexts was also recommended.
> > Vivek brought up using PAM plugins to do classifications, this suggestion was
> > nicely received. The decision was to do classification in user space and then
> > think of kernel space if it cannot be done in user space. Denis suggested that
> > classification is useful. In OpenVZ they classify all apache children to a
> > different group. Balbir asked Denis to post their classification infrastructure
> > as RFC.
> I'm not sure about this issue.
> but I like PAM approach.
Thanks balbir for nice summary.
Well, it was Rik Van Riel's idea to use PAM plugins so that processes
are put into right user cgroups upon login.
Is pam based classification alone is sufficient? I noticed couple of
instances which will avoid pam. For example.
- If one starts apache "service httpd start", then httpd threads change
their uid/gid to "apache/apache". But these threads will continue to
run in the cgroup belonging to root and will not go into apache cgroup.
- apache also offers "suexec" tool which execs a CGI script under a
different user than the user who has launched web server. I quickly
grepped for source code of suexec and it does not seem to be using
pam. That means CGI scripts running under some other user name will
continue to run in cgroup where apache is running.
I am not sure how many more such corner cases are there. These cases can
either be covered by modification of application or using some kind of
wrapper around application or by writing classification daemon.
Do we really need classification daemon to cover such cases or wrapper
approach is sufficient? I remember somebody in minisummit was mentioning
that it should work without any apache modifications.
More information about the Containers