[PATCH 21/25] io-controller: Per cgroup request descriptor support

Tue Jul 21 10:57:09 PDT 2009

On Tue, Jul 21, 2009 at 7:01 AM, Vivek Goyal<vgoyal at redhat.com> wrote:
> On Mon, Jul 20, 2009 at 10:55:31PM -0700, Nauman Rafique wrote:
>> On Mon, Jul 20, 2009 at 10:37 PM, Gui
>> Jianfeng<guijianfeng at cn.fujitsu.com> wrote:
>> > Vivek Goyal wrote:
>> >> o Currently a request queue has got fixed number of request descriptors for
>> >>   sync and async requests. Once the request descriptors are consumed, new
>> >>   processes are put to sleep and they effectively become serialized. Because
>> >>   sync and async queues are separate, async requests don't impact sync ones
>> >>   but if one is looking for fairness between async requests, that is not
>> >>   achievable if request queue descriptors become bottleneck.
>> >>
>> >> o Make request descriptor's per io group so that if there is lots of IO
>> >>   going on in one cgroup, it does not impact the IO of other group.
>> >>
>> >> o This is just one relatively simple way of doing things. This patch will
>> >>   probably change after the feedback. Folks have raised concerns that in
>> >>   hierchical setup, child's request descriptors should be capped by parent's
>> >>   request descriptors. May be we need to have per cgroup per device files
>> >>   in cgroups where one can specify the upper limit of request descriptors
>> >>   and whenever a cgroup is created one needs to assign request descritor
>> >>   limit making sure total sum of child's request descriptor is not more than
>> >>   of parent.
>> >>
>> >>   I guess something like memory controller. Anyway, that would be the next
>> >>   step. For the time being, we have implemented something simpler as follows.
>> >>
>> >> o This patch implements the per cgroup request descriptors. request pool per
>> >>   queue is still common but every group will have its own wait list and its
>> >>   own count of request descriptors allocated to that group for sync and async
>> >>   queues. So effectively request_list becomes per io group property and not a
>> >>   global request queue feature.
>> >>
>> >> o Currently one can define q->nr_requests to limit request descriptors
>> >>   allocated for the queue. Now there is another tunable q->nr_group_requests
>> >>   which controls the requests descriptr limit per group. q->nr_requests
>> >>   supercedes q->nr_group_requests to make sure if there are lots of groups
>> >>   present, we don't end up allocating too many request descriptors on the
>> >>   queue.
>> >>
>> >
>> >  Hi Vivek,
>> >
>> >  In order to prevent q->nr_requests from becoming the bottle-neck of allocating
>> >  requests, whether we can update nr_requests accordingly when allocating or removing
>> >  a cgroup?
>>
>> Vivek,
>> I agree with Gui here. In fact, it does not make much sense to keep
>> the nr_requests limit if we already have per cgroup limit in place.
>> This change also simplifies code quite a bit, as we can get rid of all
>> that sleep_on_global logic.
>>
>
> Hi Nauman, Gui,
>
> There were few reasons to keep a total limit on number of request
> descriptors (q->nr_requests) apart from per group limit.
>
> - We have this notion of queue being congested or not depending on out of
>  q->nr_requests how many are currently being used. Writeback threads,
>  some filesystems and other places make use of this information to either
>  not to block or to avoid pushing too much of data on device if queue is
>  congested.
>
>  With q->nr_requests removed, how do you define queue full and congested
>  semantics?

We can still keep q->nr_requests around, but don't use that number to
deny request descriptor allocation; only use it for defining queue
full and congested semantics.

>
> - I think slee_on_global logic makes sense even without q->nr_requests.
>  Assume that a group allows request descriptor allocation but due to lack
>  of memory, allocation fails. Where do you make this process wait to
>  attempt next time? Making all such failed processes on gloabl list on
>  queue instead of per group list makes more sense to me for following
>  reasons.
>
>        - If this is the first request allocation from the group and we
>          make the process sleep on group list, it will never be woken up
>          as no request from that group will complete.
>
>        - If there are many processes who failed request descriptor
>          allocation, when some request completes, I think it is more
>          fair to wake these up in FIFO manner to try out allocation again
>          instead of waiting for request to complete from the group
>          process belongs to. The reason being that io controller did not
>          fail the request descriptor allocation.
>
>  So even if you get rid of q->nr_requests, you still shall have to have
>  some logic of global wait list where failed allocations can wait.
>
> - It is backward compatible and there are less chances of higher layers
>  being broken due to this.
>
>
> Gui, I think automatic updation of q->nr_requests is probably not a very
> good thing. It is user defined tunable and user does not expect this to
> change automatically.
>
> At this point of time I really can't think of simpler and cleaner way.
> Ideas are welcome.
>
> Thanks
> Vivek
>