[PATCH 21/25] io-controller: Per cgroup request descriptor support
nauman at google.com
Tue Jul 21 10:57:09 PDT 2009
On Tue, Jul 21, 2009 at 7:01 AM, Vivek Goyal<vgoyal at redhat.com> wrote:
> On Mon, Jul 20, 2009 at 10:55:31PM -0700, Nauman Rafique wrote:
>> On Mon, Jul 20, 2009 at 10:37 PM, Gui
>> Jianfeng<guijianfeng at cn.fujitsu.com> wrote:
>> > Vivek Goyal wrote:
>> >> o Currently a request queue has got fixed number of request descriptors for
>> >> sync and async requests. Once the request descriptors are consumed, new
>> >> processes are put to sleep and they effectively become serialized. Because
>> >> sync and async queues are separate, async requests don't impact sync ones
>> >> but if one is looking for fairness between async requests, that is not
>> >> achievable if request queue descriptors become bottleneck.
>> >> o Make request descriptor's per io group so that if there is lots of IO
>> >> going on in one cgroup, it does not impact the IO of other group.
>> >> o This is just one relatively simple way of doing things. This patch will
>> >> probably change after the feedback. Folks have raised concerns that in
>> >> hierchical setup, child's request descriptors should be capped by parent's
>> >> request descriptors. May be we need to have per cgroup per device files
>> >> in cgroups where one can specify the upper limit of request descriptors
>> >> and whenever a cgroup is created one needs to assign request descritor
>> >> limit making sure total sum of child's request descriptor is not more than
>> >> of parent.
>> >> I guess something like memory controller. Anyway, that would be the next
>> >> step. For the time being, we have implemented something simpler as follows.
>> >> o This patch implements the per cgroup request descriptors. request pool per
>> >> queue is still common but every group will have its own wait list and its
>> >> own count of request descriptors allocated to that group for sync and async
>> >> queues. So effectively request_list becomes per io group property and not a
>> >> global request queue feature.
>> >> o Currently one can define q->nr_requests to limit request descriptors
>> >> allocated for the queue. Now there is another tunable q->nr_group_requests
>> >> which controls the requests descriptr limit per group. q->nr_requests
>> >> supercedes q->nr_group_requests to make sure if there are lots of groups
>> >> present, we don't end up allocating too many request descriptors on the
>> >> queue.
>> > Hi Vivek,
>> > In order to prevent q->nr_requests from becoming the bottle-neck of allocating
>> > requests, whether we can update nr_requests accordingly when allocating or removing
>> > a cgroup?
>> I agree with Gui here. In fact, it does not make much sense to keep
>> the nr_requests limit if we already have per cgroup limit in place.
>> This change also simplifies code quite a bit, as we can get rid of all
>> that sleep_on_global logic.
> Hi Nauman, Gui,
> There were few reasons to keep a total limit on number of request
> descriptors (q->nr_requests) apart from per group limit.
> - We have this notion of queue being congested or not depending on out of
> q->nr_requests how many are currently being used. Writeback threads,
> some filesystems and other places make use of this information to either
> not to block or to avoid pushing too much of data on device if queue is
> With q->nr_requests removed, how do you define queue full and congested
We can still keep q->nr_requests around, but don't use that number to
deny request descriptor allocation; only use it for defining queue
full and congested semantics.
> - I think slee_on_global logic makes sense even without q->nr_requests.
> Assume that a group allows request descriptor allocation but due to lack
> of memory, allocation fails. Where do you make this process wait to
> attempt next time? Making all such failed processes on gloabl list on
> queue instead of per group list makes more sense to me for following
> - If this is the first request allocation from the group and we
> make the process sleep on group list, it will never be woken up
> as no request from that group will complete.
> - If there are many processes who failed request descriptor
> allocation, when some request completes, I think it is more
> fair to wake these up in FIFO manner to try out allocation again
> instead of waiting for request to complete from the group
> process belongs to. The reason being that io controller did not
> fail the request descriptor allocation.
> So even if you get rid of q->nr_requests, you still shall have to have
> some logic of global wait list where failed allocations can wait.
> - It is backward compatible and there are less chances of higher layers
> being broken due to this.
> Gui, I think automatic updation of q->nr_requests is probably not a very
> good thing. It is user defined tunable and user does not expect this to
> change automatically.
> At this point of time I really can't think of simpler and cleaner way.
> Ideas are welcome.
More information about the Containers