[PATCH cgroup/for-3.11 1/3] cgroup: mark "tasks" cgroup file as insane

Daniel P. Berrange berrange at redhat.com
Thu Jun 6 09:20:55 UTC 2013


On Tue, Jun 04, 2013 at 01:19:47PM -0700, Tejun Heo wrote:
> Hey, Daniel.
> 
> On Tue, Jun 04, 2013 at 12:15:56PM +0100, Daniel P. Berrange wrote:
> > With libvirt and KVM we require the ability to put different threads
> 
> I really don't think cgroup has ever been intended (if there were ever
> any such overall intending) or is suited for something as fine grained
> as in-process resource management.  There already are existing
> per-thread interfaces for that.  Please use them instead.  cgroup
> simply doesn't fit.

Unless I'm mistaken there is no alternative that can work. With QEMU
we need to apply scheduling controls to 

  1. Individual vCPU threads
  2. All non-vCPU threads (ie QEMU's I/O threads)

We can use per-thread APIs for 1, but for 2 we require something that
applies to the group of threads as a whole, without also impacting the
controls set for the vCPU threads. AFAIK, nothing except cgroups as
we use them today can satisfy that requirement ? Am I wrong ? Is there
something else that can achieve this same setup ?


> > in different cgroups for the "cpu", "cpuset" & "cpuacct" controllers.
> 
> cpu and cpuacct are in the process of being merged.  The scheduler
> people hate the duplicate accounting the separation causes and cpuacct
> is generally considered a mistake that we shouldn't repeat.  So, umm,
> you're really depending on a lot of things which are considered big
> mistakes in cgroup.

Merging cpu + cpuacct together is not a problem - they're already
co-mounted by systemd. What I'm saying is that for cpu, cpuset
and cpuacct we create

  /some/path/
    |
    +- domain-cgroup
        |
        +- vcpu0 - thread for cpu 0
        +- vcpu1 - thread for cpu 1
        +- emulator - all other non-vCPU threads

We can't leave the non-vcpu threads at the higher level, because
then limits applied at the 'domain-cgroup' level would impact on
the vcpu threads.

while for all other controllers (memory, blkio, etc) we create

  /some/path
    |
    +- domain-cgroup  -  all threads

The directory structure is the same in all controllers, except that
with the cpu, cpuset + cpuacct controllers, we create 2 further leaf
nodes.

I understand that having wildly distinct hiearchies across different
controllers causes alot of pain for the kernel. Libvirt doesn't
actually require that full level of flexibility though. Our needs
are very much simpler. We're happy with the same core hierarchy
across all controllers. We just want to be able to create an extra
leaf node in some controllers to move threads about. 

It would be fine with us if the kernel required that the same directory
hierarchy exists in all controllers, and mandated that threads can only
be moved to a directory immediately below where the process is initially
placed.

> > This is to allow us to control schedular tunables / placement for
> > QEMU vCPU threads, independantly of limits for QEMU I/O threads. So
> > requiring all threads of a process to be in the same cgroup isn't
> > sufficiently flexible for our needs.
> 
> It was never suited to that level of flexibility and it will never be
> and things like that will be clearly forbidden rather than being left
> in the current "not fully supported but kinda works" state.  The
> existing stuff won't break but new things won't keep the support.  If
> you're fine with staying with the old interface, which will be around
> for the foreseeable future, that's fine too, but if you intend to move
> onto the new interface when it finally becomes ready, whenever that
> is, please move on.

You say the old interface will be around for the forseeable future, but
if systemd starts applying a different setup to comply with your new
scheme, then libvirt does get given any option to continue to use the
old scheme. So even if you leave old interfaces around, we're going to
be forced to change. That's not really a back-compatibility story that
works for applications.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|


More information about the Containers mailing list