Too many I/O controller patches

Andrea Righi righi.andrea at
Tue Aug 5 02:27:52 PDT 2008

Paul Menage wrote:
> On Mon, Aug 4, 2008 at 1:44 PM, Andrea Righi <righi.andrea at> wrote:
>> A safer approach IMHO is to force the tasks to wait synchronously on
>> each operation that directly or indirectly generates i/o.
>> In particular the solution used by the io-throttle controller to limit
>> the dirty-ratio in memory is to impose a sleep via
>> schedule_timeout_killable() in balance_dirty_pages() when a generic
>> process exceeds the limits defined for the belonging cgroup.
>> Limiting read operations is a lot more easy, because they're always
>> synchronized with i/o requests.
> I think that you're conflating two issues:
> - controlling how much dirty memory a cgroup can have at any given
> time (since dirty memory is much harder/slower to reclaim than clean
> memory)
> - controlling how much effect a cgroup can have on a given I/O device.
> By controlling the rate at which a task can generate dirty pages,
> you're not really limiting either of these. I think you'd have to set
> your I/O limits artificially low to prevent a case of a process
> writing a large data file and then doing fsync() on it, which would
> then hit the disk with the entire file at once, and blow away any QoS
> guarantees for other groups.

Anyway, dirty pages ratio is directly proportional to the IO that will
be performed on the real device, isn't it? this wouldn't prevent IO
bursts as you correctly say, but IMHO it is a simple and quite effective
way to measure the IO write activity of each cgroup on each affected

To prevent the IO peaks I usually reduce the vm_dirty_ratio, but, ok,
this is a workaround, not the solution to the problem either.

IMHO, based on the dirty-page rate measurement, we should apply both
limiting methods: throttle dirty-pages ratio to prevent too many dirty
pages in the system (harde to reclaim and generating
unpredictable/unpleasant/unresponsiveness behaviour), and throttle the
dispatching of IO requests at the device-mapper/IO-scheduler layer to
smooth IO peaks/bursts, generated by fsync() and similar scenarios.

Another different approach could be to implement the measurement in the
elevator, looking at the elapsed between the IO request is issued to the
drive and the request is served. So, look at the start time T1,
completion time T2, take the difference (T2 - T1) and say: cgroup C1
consumed an amount of IO of (T2 - T1), and also use a token-bucket
policy to fill/reduce the "credits" of each IO cgroup in terms of IO
time slots. This would be a more precise measurement, instead of trying
to predict how expensive the IO operation will be, only looking at the
dirty-page ratio. Then throttle both dirty-page ratio *and* the
dispatching of the IO requests submitted by the cgroup that exceeds the

> As Dave suggested, I think it would make more sense to have your
> page-dirtying throttle points hook into the memory controller instead,
> and allow the memory controller to track/limit dirty pages for a
> cgroup, and potentially do throttling as part of that.
> Paul

Yes, implementing page-drity throttling in memory controller seems
absolutely reasonable. I can try to move in this direction, merge the
page-dirty throttling in memory controller and also post the RFC.


More information about the Containers mailing list