[RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs

Al Viro viro at ZenIV.linux.org.uk
Sun Jun 12 11:17:10 PDT 2011


On Sun, Jun 12, 2011 at 10:59:42AM -0700, Linus Torvalds wrote:
> On Sun, Jun 12, 2011 at 12:15 AM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
> >
> > I honestly hate the pattern that is being used here. ?Holding a
> > reference count because we can't be bothered to free things reliably
> > when we actually stop using them.
> 
> WHAT?
> 
> That's what a reference count *is*. It's all about "free things
> reliably when we actually stop using them".
> 
> Your comment makes zero sense.
> 
> EVERY SINGLE kernel data structure should be reference counted. Read
> Documentation/CodingStyle, or look at any of the good code in the
> kernel (ie core process or VFS code). A non-refcounted data structure
> that is used by more than one entity IS A BUG!
> 
> Quite frankly, your objection sounds moronic. If there is more than
> one user, then a reference count is _always_ the right thing. Nothing
> else ever works, and trust me, people have tried. They've tried
> locking, they've tried luck, they've tried crazy things. Nothing but
> refcounts works.

No, what the current code is trying to do is to have two kinds of references -
contributing to refcount (they do have one, all right) and non-contributing.
*AND* it attempts to hunt non-contributing ones down and replace them with
NULL when refcount hits zero.  And fscks up in dealing with the results.

What this patch does is pretty much the same thing we do for mm_struct and
superblocks - two refcounts, one controlling the shutdown of object and another
controlling the actual freeing of memory.  The second kind of references
contributes to the "memory" refcount and so does having non-zero "active"
refcount.  No games with replacing references with NULL, no races around those,
etc.

Eric's objection is that sysfs superblock would pin the memory occupied by
struct net down until it's unmounted.  Frankly, I think it's a BS -
aforementioned 2.5K are trivial to pin down *anyway*.  Just chdir deep
enough into that instance of sysfs tree and inodes/dentries you've pinned
down by that will easily eat this much.


More information about the Containers mailing list