[PATCH v3 3/4] limit nr_dentries per superblock

Mon Aug 15 00:03:51 PDT 2011

On Sun, Aug 14, 2011 at 07:13:51PM +0400, Glauber Costa wrote:
> This patch lays the foundation for us to limit the dcache size.
> Each super block can have only a maximum amount of dentries under its
> sub-tree. Allocation fails if we we're over limit and the cache
> can't be pruned to free up space for the newcomers.
> 
> Signed-off-by: Glauber Costa <glommer at parallels.com>
> CC: Dave Chinner <david at fromorbit.com>
> CC: Eric Dumazet <eric.dumazet at gmail.com>
> ---
>  fs/dcache.c        |   28 ++++++++++++++++++++++++++++
>  fs/super.c         |    1 +
>  include/linux/fs.h |    1 +
>  3 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 815d9fd..ddd02a2 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1180,6 +1180,31 @@ void shrink_dcache_parent(struct dentry * parent)
>  }
>  EXPORT_SYMBOL(shrink_dcache_parent);
>  
> +static inline int dcache_mem_check(struct super_block *sb)
> +{
> +	struct shrink_control sc = {
> +		.gfp_mask = GFP_KERNEL,
> +	};
> +
> +	if (sb->s_nr_dentry_max == INT_MAX)
> +		return 0;
> +
> +	do {
> +		int nr_dentry;
> +
> +		nr_dentry = percpu_counter_read_positive(&sb->s_nr_dentry);
> +		if (nr_dentry > sb->s_nr_dentry_max)
> +			nr_dentry =
> +				percpu_counter_sum_positive(&sb->s_nr_dentry);
> +		if (nr_dentry < sb->s_nr_dentry_max)
> +			return 0;
> +
> +	/* nr_pages = 1, lru_pages = 0 should get delta ~ 2 */
> +	} while (shrink_one_shrinker(&sb->s_shrink, &sc, 1, 0));
> +
> +	return -ENOMEM;
> +}

The changes here don't address the comments I made previously about
the error range of the per-cpu counter and update frequency of the
the global counter. w.r.t to small delta changes of the counted
amount.

That is, you cannot expect percpu_counter_read_positive() to change
when you decrement a counter by 2 counts (assuming both objects are
freed), and there's no guarantee that percpu_counter_sum_positive()
will drop below the threshold either while
percpu_counter_read_positive() is returning numbers above the
threshold. That makes this a very expensive and oft-repeated loop.

Indeed, that even assumes that 2 objects that are freed are
dentries. If there are more inodes that dentries in memory, the the
dentries won't even be trimmed, and you'll get stuck in the horribly
expensive loop trimming the cache 2 inodes at a time until the
caches start to balance. Shrinking must be done in large batches to
be effective, not one object at a time.

Further, the shrinker may not make progress freeing items, which
will result in this code returning ENOMEM when the shrinker may
simply have been passing over referenced (i.e. cache hot) dentries.
That will return spurious ENOMEM results back to d_alloc, resulting
in spurious failures that will be impossible to track down...

This also demonstrates my comment about where you factored
shrink_one_shrinker() to be the wrong place. You've already got a
shrink-control structure, so adding a value to .nr_to_scan to the
initialisation gets around the entire need to do all that guess work
of running through the vmscan specific algorithms to get 2 objects
freed.

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com