RFC: Attaching threads to cgroups is OK?

Balbir Singh balbir at linux.vnet.ibm.com
Thu Aug 21 03:28:55 PDT 2008

Fernando Luis Vázquez Cao wrote:
> Hi Balbir,
> On Thu, 2008-08-21 at 09:02 +0530, Balbir Singh wrote:
>> Fernando Luis Vázquez Cao wrote:
>>> On Wed, 2008-08-20 at 20:48 +0900, Hirokazu Takahashi wrote:
>>>> Hi,
>>>>>> Tsuruta-san, how about your bio-cgroup's tracking concerning this?
>>>>>> If we want to use your tracking functions for each threads seperately, 
>>>>>> there seems to be a problem.
>>>>>> ===cf. mm_get_bio_cgroup()===================
>>>>>>            owner
>>>>>> mm_struct ----> task_struct ----> bio_cgroup
>>>>>> =============================================
>>>>>> In my understanding, the mm_struct of a thread is same as its parent's.
>>>>>> So, even if we attach the TIDs of some threads to different cgroups the 
>>>>>> tracking always returns the same bio_cgroup -- its parent's group.
>>>>>> Do you have some policy about in which case we can use your tracking?
>>>>> It's will be resitriction when io-controller reuse information of the owner
>>>>> of memory. But if it's very clear who issues I/O (by tracking read/write
>>>>> syscall), we may have chance to record the issuer of I/O to page_cgroup
>>>>> struct. 
>>>> This might be slightly different topic though,
>>>> I've been thinking where we should add hooks to track I/O reqeust.
>>>> I think the following set of hooks is enough whether we are going to
>>>> support thread based cgroup or not.
>>>>   Hook-1: called when allocating a page, where the memory controller
>>>> 	  already have a hoook.
>>>>   Hook-2: called when making a page in page-cache dirty.
>>>> For anonymous pages, Hook-1 is enough to track any type of I/O request.
>>>> For pages in page-cache, Hook-1 is also enough for read I/O because
>>>> the I/O is issued just once right after allocting the page.
>>>> For write I/O requests to pages in page-cache, Hook-1 will be okay
>>>> in most cases but sometimes process in another cgroup may write
>>>> the pages. In this case, Hook-2 is needed to keep accurate to track
>>>> I/O requests.
>>> This relative simplicity is what prompted me to say that we probably
>>> should try to disentangle the io tracking functionality from the memory
>>> controller a bit more (of course we still should reuse as much as we can
>>> from it). The rationale for this is that the existing I/O scheduler
>>> would benefit from proper io tracking capabilities too, so it'd be nice
>>> if we could have them even in non-cgroup-capable kernels.
>> Hook 2 referred to in the mail above exist today in the form of task IO accounting.
> Yup.
>>> As an aside, when the IO context of a certain IO operation is known
>>> (synchronous IO comes to mind) I think it should be cashed in the
>>> resulting bio so that we can do without the expensive accesses to
>>> bio_cgroup once it enters the block layer.
>> Will this give you everything you need for accounting and control (from the
>> block layer?)
> Well, it depends on what you are trying to achieve.
> Current IO schedulers such as CFQ only care about the io_context when
> scheduling requests. When a new request comes in CFQ assumes that it was
> originated in the context of the current task, which obviously does not
> hold true for buffered IO and aio. This problem could be solved by using
> bio-cgroup for IO tracking, but accessing the io context information is
> somewhat expensive: 
> page->page_cgroup->bio_cgroup->io_context.
> If at the time of building a bio we know its io context (i.e. the
> context of the task or cgroup that generated that bio) I think we should
> store it in the bio itself, too. With this scheme, whenever the kernel
> needs to know the io_context of a particular block IO operation the
> kernel would first try to retrieve its io_context directly from the bio,
> and, if not available there, would resort to the slow path (accessing it
> through bio_cgroup). My gut feeling is that elevator-based IO resource
> controllers would benefit from such an approach, too.
> - Fernando

OK, that seems to make sense. Thanks for explaining.


More information about the Containers mailing list