[PATCH v2 26/28] memcg: per-memcg kmem shrinking

Glauber Costa glommer at parallels.com
Mon Apr 1 08:48:43 UTC 2013


>> +static int memcg_try_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
>> +{
>> +	int retries = MEM_CGROUP_RECLAIM_RETRIES;
> 
> I'm not sure this retry numbers, for anon/file LRUs is suitable for kmem.
> 
Suggestions ?

>> +	struct res_counter *fail_res;
>> +	int ret;
>> +
>> +	do {
>> +		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
>> +		if (!ret)
>> +			return ret;
>> +
>> +		if (!(gfp & __GFP_WAIT))
>> +			return ret;
>> +
>> +		/*
>> +		 * We will try to shrink kernel memory present in caches. We
>> +		 * are sure that we can wait, so we will. The duration of our
>> +		 * wait is determined by congestion, the same way as vmscan.c
>> +		 *
>> +		 * If we are in FS context, though, then although we can wait,
>> +		 * we cannot call the shrinkers. Most fs shrinkers (which
>> +		 * comprises most of our kmem data) will not run without
>> +		 * __GFP_FS since they can deadlock. The solution is to
>> +		 * synchronously run that in a different context.
>> +		 */
>> +		if (!(gfp & __GFP_FS)) {
>> +			/*
>> +			 * we are already short on memory, every queue
>> +			 * allocation is likely to fail
>> +			 */
>> +			memcg_stop_kmem_account();
>> +			schedule_work(&memcg->kmemcg_shrink_work);
>> +			flush_work(&memcg->kmemcg_shrink_work);
>> +			memcg_resume_kmem_account();
>> +		} else if (!try_to_free_mem_cgroup_kmem(memcg, gfp))
>> +			congestion_wait(BLK_RW_ASYNC, HZ/10);
> 
> Why congestion_wait() ? I think calling congestion_wait() in vmscan.c is
> a part of memory-reclaim logic but I don't think the caller should do
> this kind of voluteer wait without good reason..
> 
> 

Although it is not the case with dentries (or inodes, since only
non-dirty inodes goes to the lru list), some objects we are freeing may
need time to be written back to disk. This is the case for instance with
the buffer heads and bio's. They will not be actively shrunk in
shrinkers, but it is my understanding that they will be released. Inodes
as well, may have time to be written back and become non-dirty.

In practice, in my tests, this would almost-always fail after a retry if
we don't wait, and almost always succeed in a retry if we do wait.

Am I missing something in this interpretation ?


More information about the Containers mailing list