[RFC] writeback and cgroup

Vivek Goyal vgoyal at redhat.com
Thu Apr 5 17:09:56 UTC 2012


On Thu, Apr 05, 2012 at 09:31:13AM -0700, Tejun Heo wrote:
> Hey, Vivek.
> 
> On Wed, Apr 04, 2012 at 04:18:16PM -0400, Vivek Goyal wrote:
> > Hey how about reconsidering my other proposal for which I had posted
> > the patches. And that is keep throttling still at device level. Reads
> > and direct IO get throttled asynchronously but buffered writes get
> > throttled synchronously.
> > 
> > Advantages of this scheme.
> > 
> > - There are no separate knobs.
> > 
> > - All the IO (read, direct IO and buffered write) is controlled using
> >   same set of knobs and goes in queue of same cgroup.
> > 
> > - Writeback logic has no knowledge of throttling. It just invokes a 
> >   hook into throttling logic of device queue.
> > 
> > I guess this is a hybrid of active writeback throttling and back pressure
> > mechanism.
> > 
> > But it still does not solve the NFS issue as well as for direct IO,
> > filesystems still can get serialized, so metadata issue still needs to 
> > be resolved. So one can argue that why not go for full "back pressure"
> > method, despite it being more complex.
> > 
> > Here is the link, just to refresh the memory. Something to keep in mind
> > while assessing alternatives.
> > 
> > https://lkml.org/lkml/2011/6/28/243
> 
> Hmmm... so, this only works for blk-throttle and not with the weight.
> How do you manage interaction between buffered writes and direct
> writes for the same cgroup?
> 

Yes, it is only for blk-throttle. We just account for buffered write
in balance_dirty_pages() instead of when they are actually submitted to
device by flusher thread.

IIRC, I just had two queues. In one queue I had bios and in another queue
I had  tasks with information how much memory they are dirtying. So I 
did round robin in terms of dispatch between two queues depending on
throttling rate. I will allow dispatch bio from direct IO queue, then 
look at the other queue and see how much IO other task wanted to do and
when sufficient time had passed based on throttling rate, I will remove
that task from my wait queue and wake it up. 

That way it becomes equivalent to that two IO paths (direct IO + buffered
write),  doing IO to single pipe which has throttling limit. Both the
IOs are sujected to same common limit (and no split). Just that we round
robin between two types of IO and try to divide available bandwidth
equally (This ofcourse could be made tunable).

Thanks
Vivek


More information about the Containers mailing list