[RFC] [PATCH 0/2] memcg: per cgroup dirty limit

Mon Feb 22 09:36:40 PST 2010

* Vivek Goyal <vgoyal at redhat.com> [2010-02-22 09:27:45]:

> On Sun, Feb 21, 2010 at 04:18:43PM +0100, Andrea Righi wrote:
> > Control the maximum amount of dirty pages a cgroup can have at any given time.
> > 
> > Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim)
> > page cache used by any cgroup. So, in case of multiple cgroup writers, they
> > will not be able to consume more than their designated share of dirty pages and
> > will be forced to perform write-out if they cross that limit.
> > 
> > The overall design is the following:
> > 
> >  - account dirty pages per cgroup
> >  - limit the number of dirty pages via memory.dirty_bytes in cgroupfs
> >  - start to write-out in balance_dirty_pages() when the cgroup or global limit
> >    is exceeded
> > 
> > This feature is supposed to be strictly connected to any underlying IO
> > controller implementation, so we can stop increasing dirty pages in VM layer
> > and enforce a write-out before any cgroup will consume the global amount of
> > dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes limit.
> > 
> 
> Thanks Andrea. I had been thinking about looking into it from IO
> controller perspective so that we can control async IO (buffered writes
> also).
> 
> Before I dive into patches, two quick things.
> 
> - IIRC, last time you had implemented per memory cgroup "dirty_ratio" and
>   not "dirty_bytes". Why this change? To begin with either per memcg
>   configurable dirty ratio also makes sense? By default it can be the
>   global dirty ratio for each cgroup.
> 
> - Looks like we will start writeout from memory cgroup once we cross the
>   dirty ratio, but still there is no gurantee that we be writting pages
>   belonging to cgroup which crossed the dirty ratio and triggered the
>   writeout.
> 
>   This behavior is not very good at least from IO controller perspective
>   where if two dd threads are dirtying memory in two cgroups, then if
>   one crosses it dirty ratio, it should perform writeouts of its own pages
>   and not other cgroups pages. Otherwise we probably will again introduce
>   serialization among two writers and will not see service differentation.

I thought that the I/O controller would eventually provide hooks to do
this.. no?

> 
>   May be we can modify writeback_inodes_wbc() to check first dirty page
>   of the inode. And if it does not belong to same memcg as the task who
>   is performing balance_dirty_pages(), then skip that inode.

Do you expect all pages of an inode to be paged in by the same cgroup?

-- 
	Three Cheers,
	Balbir