[PATCH cgroup/for-3.11 1/3] cgroup: mark "tasks" cgroup file as insane

Vivek Goyal vgoyal at redhat.com
Tue Jun 4 15:12:36 UTC 2013


On Tue, Jun 04, 2013 at 03:50:08PM +0100, Daniel P. Berrange wrote:
> On Tue, Jun 04, 2013 at 10:34:44AM -0400, Vivek Goyal wrote:
> > On Tue, Jun 04, 2013 at 12:15:56PM +0100, Daniel P. Berrange wrote:
> > > On Mon, Jun 03, 2013 at 07:13:02PM -0700, Tejun Heo wrote:
> > > > Some resources controlled by cgroup aren't per-task and cgroup core
> > > > allowing threads of a single thread_group to be in different cgroups
> > > > forced memcg do explicitly find the group leader and use it.  This is
> > > > gonna be nasty when transitioning to unified hierarchy and in general
> > > > we don't want and won't support granularity finer than processes.
> > > 
> > > With libvirt and KVM we require the ability to put different threads
> > > in different cgroups for the "cpu", "cpuset" & "cpuacct" controllers.
> > > This is to allow us to control schedular tunables / placement for
> > > QEMU vCPU threads, independantly of limits for QEMU I/O threads. So
> > > requiring all threads of a process to be in the same cgroup isn't
> > > sufficiently flexible for our needs.
> > 
> > For placement of vCPU threads, can we set per thread cpu affinity
> > (sched_setaffinity()), instead of using cgroups for that purpose.
> 
> sched_setaffinity can't overrride affinity already set in the
> cgroup. So this won't allow for disjoint affinity sets between
> threads. ie if you use cgroups to bind the process to pCPU 1
> (to apply all possible non-vCPU threads) and then want to bind
> vCPU threads to pCPU 2 you can't do it.
> 

I thought we don't have to override affinity set in cgroup. Instead
subdivide that among its child tasks as needed.

So in above example, we would allow cgroup to have both pcpu1 and pcpu2
and then set affiinity for vcpu threads as well as non-vcpu threads.

> eg for cpu/cpuacct/cpuset controllers we have a setup
> 
>  <domain cgroup> 0 threads
>    |
>    +- vcpu0     1 thread
>    +- vcpu1     1 thread
>    +- emulator  n threads
> 
> and want complete independance in settings for each of these child
> cgroups.

I guess this will not work with single hierarchy as controllers like
blkio don't support putting threads of process in separate group. All
threads of a process share iocontext and an iocontext is associated
with a cgroup.

> 
> > Apart from cpu affinity, what scheduling parameters we want different
> > between different threads.
> 
> Placement isn't the big deal - it is really the cpu.cfs_period_us,
> cpu.cfs_quota_us and cpu.shares settings that are important ones,
> along with cpuacct.{stat,usage,usage_percpu} to track utilization
> across multiple threads.

Yes, upper limiting cpu usage will become unavailable at thread level
if we make this change. I guess customers don't care but libvirt might
internally want to upper limit cpu usage of group of threads. Don't
know why though. And we don't have this feature available per thread.

I am hoping there is a way to set priority per thread and that should
be able to emulate cpu.shares at a thread level.

> 
> For cpuacct, if we only had 1 cgroup for all threads, we'd have to
> read the process's overall usage and then subtract usage of individual
> threads. This would really be a step backwards, throwing away the
> benefits that cgroups brought in allowing setup arbitrary grouping of
> tasks :-(

So these per thread utilization stats are exported to user. Curious, In general
how this per thread/group_of_some_threads data is useful?

Thanks
Vivek


More information about the Containers mailing list