[PATCH 2/4] limit nr_dentries per superblock

Sat Jul 30 18:15:31 PDT 2011

On Fri, Jul 29, 2011 at 05:44:17PM +0400, Glauber Costa wrote:
> This patch lays the foundation for us to limit the dcache size.
> Each super block can have only a maximum amount of dentries under its
> sub-tree. Allocation fails if we we're over limit and the cache
> can't be pruned to free up space for the newcomers.
> 
> Signed-off-by: Glauber Costa <glommer at parallels.com>
> CC: Dave Chinner <david at fromorbit.com>
> ---
>  fs/dcache.c        |   45 +++++++++++++++++++++++++++++++++++++--------
>  fs/super.c         |    1 +
>  include/linux/fs.h |    1 +
>  3 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 9cb6395..3bdb106 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -663,7 +663,7 @@ EXPORT_SYMBOL(d_prune_aliases);
>   *
>   * This may fail if locks cannot be acquired no problem, just try again.
>   */
> -static void try_prune_one_dentry(struct dentry *dentry)
> +static int try_prune_one_dentry(struct dentry *dentry)
>  	__releases(dentry->d_lock)
>  {
>  	struct dentry *parent;

Comment explaining the return value?

[...]
> @@ -740,7 +744,7 @@ static void shrink_dentry_list(struct list_head *list)
>   *
>   * If flags contains DCACHE_REFERENCED reference dentries will not be pruned.
>   */
> -static void __shrink_dcache_sb(struct super_block *sb, int count, int flags)
> +static int __shrink_dcache_sb(struct super_block *sb, int count, int flags)
>  {
>  	struct dentry *dentry;
>  	LIST_HEAD(referenced);

What's the new return value do?

> @@ -781,7 +785,7 @@ relock:
>  		list_splice(&referenced, &sb->s_dentry_lru);
>  	spin_unlock(&dcache_lru_lock);
>  
> -	shrink_dentry_list(&tmp);
> +	return shrink_dentry_list(&tmp);
>  }
>  
>  /**
> @@ -1176,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>  }
>  EXPORT_SYMBOL(shrink_dcache_parent);
>  
> +static int dcache_mem_check(struct super_block *sb)
> +{
> +	int nr_dentry;
> +	int count;
> +
> +	if (percpu_counter_read_positive(&sb->s_nr_dentry) <=
> +						sb->s_nr_dentry_max)
> +		return 0;
> +
> +	do {
> +		nr_dentry = percpu_counter_sum_positive(&sb->s_nr_dentry);
> +
> +		if (nr_dentry < sb->s_nr_dentry_max)
> +			return 0;
> +
> +		count = (nr_dentry - sb->s_nr_dentry_max) << 1;
> +
> +	} while (__shrink_dcache_sb(sb, count, DCACHE_REFERENCED) > 0);
> +
> +	return -ENOMEM;
> +}

I don't like the idea of trying to be exact on how many we shrink.
percpu_counter_read_positive() will always have an error in it:
+/-(N cpus * 31) by default. Statistically speaking 50% of the cases
it passes will then get rejected on the expensive sum operation. And
then we shrink only 2x the difference between the counter and the
max value, which will leave it inside the error margin of the
counter anyway. Hence once we hit the threshold, we'll continually
take expensive sum operations, many for no reason.  There needs to
be some hysteresis here to avoid simply hammering the percpu counter
on every single __d_alloc call. Perhaps per-cpu counters are not
suited for this purpose?

Indeed, what happens if one caller saw the read and sum over the
limit, then starts shrinking but other callers don't see the read
over the limit but keep allocating dentries and hence keeping the
shrinking caller's sum over the limit and hence forever shrinking?
That seems like a unprivileged random process DOS, yes?

So at minimum, the loop should not recalculate how much to shrink -
calculate the amount to shrink, the run a loop to shrink that much
and only that much. If there are other concurrent callers, then
they'll also do a shrink, but nothing will get stuck in a loop.

Hmmm, what happens when the size of the cache and/or the max value
are less than the error margin of the counter?

Also, this doesn't address the problem I originally commented on -
it only shrinks the dcache, not the inode or filesystem caches that
hold all the memory.  I mentioned the new per-sb shrinkers for a
reason - you should be calling them, not __shrink_dcache_sb(). i.e.
building a scan_control structure and calling
sb->shrinker.shrink(&sb->shrinker, &sc);

I'm seriously starting to think that this needs to share an
implementation with the mm shrinker code, because it already
solves most of these problems...

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com