[RFC] [PATCH 0/2] memcg: per cgroup dirty limit
balbir at linux.vnet.ibm.com
Mon Feb 22 09:36:40 PST 2010
* Vivek Goyal <vgoyal at redhat.com> [2010-02-22 09:27:45]:
> On Sun, Feb 21, 2010 at 04:18:43PM +0100, Andrea Righi wrote:
> > Control the maximum amount of dirty pages a cgroup can have at any given time.
> > Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim)
> > page cache used by any cgroup. So, in case of multiple cgroup writers, they
> > will not be able to consume more than their designated share of dirty pages and
> > will be forced to perform write-out if they cross that limit.
> > The overall design is the following:
> > - account dirty pages per cgroup
> > - limit the number of dirty pages via memory.dirty_bytes in cgroupfs
> > - start to write-out in balance_dirty_pages() when the cgroup or global limit
> > is exceeded
> > This feature is supposed to be strictly connected to any underlying IO
> > controller implementation, so we can stop increasing dirty pages in VM layer
> > and enforce a write-out before any cgroup will consume the global amount of
> > dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes limit.
> Thanks Andrea. I had been thinking about looking into it from IO
> controller perspective so that we can control async IO (buffered writes
> Before I dive into patches, two quick things.
> - IIRC, last time you had implemented per memory cgroup "dirty_ratio" and
> not "dirty_bytes". Why this change? To begin with either per memcg
> configurable dirty ratio also makes sense? By default it can be the
> global dirty ratio for each cgroup.
> - Looks like we will start writeout from memory cgroup once we cross the
> dirty ratio, but still there is no gurantee that we be writting pages
> belonging to cgroup which crossed the dirty ratio and triggered the
> This behavior is not very good at least from IO controller perspective
> where if two dd threads are dirtying memory in two cgroups, then if
> one crosses it dirty ratio, it should perform writeouts of its own pages
> and not other cgroups pages. Otherwise we probably will again introduce
> serialization among two writers and will not see service differentation.
I thought that the I/O controller would eventually provide hooks to do
> May be we can modify writeback_inodes_wbc() to check first dirty page
> of the inode. And if it does not belong to same memcg as the task who
> is performing balance_dirty_pages(), then skip that inode.
Do you expect all pages of an inode to be paged in by the same cgroup?
More information about the Containers