RFC: Attaching threads to cgroups is OK?

Fernando Luis Vázquez Cao fernando at oss.ntt.co.jp
Wed Aug 20 22:25:06 PDT 2008


Hi Balbir,

On Thu, 2008-08-21 at 09:02 +0530, Balbir Singh wrote:
> Fernando Luis Vázquez Cao wrote:
> > On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote:
> >> Hi,
> >>
> >>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this?
> >>>> If we want to use your tracking functions for each threads seperately, 
> >>>> there seems to be a problem.
> >>>> ===cf. mm_get_bio_cgroup()===================
> >>>>            owner
> >>>> mm_struct ----> task_struct ----> bio_cgroup
> >>>> =============================================
> >>>> In my understanding, the mm_struct of a thread is same as its parent's.
> >>>> So, even if we attach the TIDs of some threads to different cgroups the 
> >>>> tracking always returns the same bio_cgroup -- its parent's group.
> >>>> Do you have some policy about in which case we can use your tracking?
> >>>>
> >>> It's will be resitriction when io-controller reuse information of the owner
> >>> of memory. But if it's very clear who issues I/O (by tracking read/write
> >>> syscall), we may have chance to record the issuer of I/O to page_cgroup
> >>> struct. 
> >> This might be slightly different topic though,
> >> I've been thinking where we should add hooks to track I/O reqeust.
> >> I think the following set of hooks is enough whether we are going to
> >> support thread based cgroup or not.
> >>
> >>   Hook-1: called when allocating a page, where the memory controller
> >> 	  already have a hoook.
> >>   Hook-2: called when making a page in page-cache dirty.
> >>
> >> For anonymous pages, Hook-1 is enough to track any type of I/O request.
> >> For pages in page-cache, Hook-1 is also enough for read I/O because
> >> the I/O is issued just once right after allocting the page.
> >> For write I/O requests to pages in page-cache, Hook-1 will be okay
> >> in most cases but sometimes process in another cgroup may write
> >> the pages. In this case, Hook-2 is needed to keep accurate to track
> >> I/O requests.
> > 
> > This relative simplicity is what prompted me to say that we probably
> > should try to disentangle the io tracking functionality from the memory
> > controller a bit more (of course we still should reuse as much as we can
> > from it). The rationale for this is that the existing I/O scheduler
> > would benefit from proper io tracking capabilities too, so it'd be nice
> > if we could have them even in non-cgroup-capable kernels.
> > 
> 
> Hook 2 referred to in the mail above exist today in the form of task IO accounting.
Yup.

> > As an aside, when the IO context of a certain IO operation is known
> > (synchronous IO comes to mind) I think it should be cashed in the
> > resulting bio so that we can do without the expensive accesses to
> > bio_cgroup once it enters the block layer.
> 
> Will this give you everything you need for accounting and control (from the
> block layer?)

Well, it depends on what you are trying to achieve.

Current IO schedulers such as CFQ only care about the io_context when
scheduling requests. When a new request comes in CFQ assumes that it was
originated in the context of the current task, which obviously does not
hold true for buffered IO and aio. This problem could be solved by using
bio-cgroup for IO tracking, but accessing the io context information is
somewhat expensive: 

page->page_cgroup->bio_cgroup->io_context.

If at the time of building a bio we know its io context (i.e. the
context of the task or cgroup that generated that bio) I think we should
store it in the bio itself, too. With this scheme, whenever the kernel
needs to know the io_context of a particular block IO operation the
kernel would first try to retrieve its io_context directly from the bio,
and, if not available there, would resort to the slow path (accessing it
through bio_cgroup). My gut feeling is that elevator-based IO resource
controllers would benefit from such an approach, too.

- Fernando



More information about the Containers mailing list