[REVIEW][PATCH 1/3] vfs: In d_path don't call d_dname on a mount point

Al Viro viro at ZenIV.linux.org.uk
Sat Nov 30 06:15:26 UTC 2013


[Eric Dumazet Cc'd since ->d_dname() had been introduced by him]

On Tue, Nov 26, 2013 at 04:16:13PM -0800, Eric W. Biederman wrote:
> >   proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
> >
> > This breaks userspace which expects the 2nd field in
> > /proc/mounts to be a valid path.
> 
> The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to
> dentries allocated with d_alloc_pseudo that we can mount, and
> that have interesting names printed out with d_dname.
> 
> When these files are bind mounted /proc/mounts is not currently
> displaying the mount point correctly because d_dname is called instead
> of just displaying the path where the file is mounted.
> 
> Solve this by adding an explicit check to distinguish mounted pseudo
> inodes and unmounted pseudo inodes.  Unmounted pseudo inodes always
> use mount of their filesstem as the mnt_root  in their path making
> these two cases easy to distinguish.

Excuse me, but this is a wrong way to deal with that.  The root of the
problem is your "let's give them ->d_dname() *and* let them live on the
same procfs vfsmount".  The latter, BTW, requires the suckers to access
nd->path.mnt, which is a bad thing by itself.  Plus the wrong way
->d_dname() is called.  It was kinda-sorta defensible back when it
had been introduced, but only because all filesystems with ->d_dname()
had been MS_NOUSER ones.  Kinda-sorta since it introduced an unpleasant
difference in semantics of d_path() and the rest of its ilk - see
e.g. tomoyo calling ->d_dname() manually.

->d_dname() addresses a real need, no arguments here - we do want to
delay name generation for pipes and sockets until asked for it.  The
problem is that it's dealt with in the wrong place - we ought to
have it handled when prepend_path() reaches a vfsmount with no
parent and an IS_ROOT() dentry.  _Then_ we need to check if it has
->d_op->d_dname() and call one if it does.

There's a bunch of unpleasant details around the handling of "what if
the final vfsmount is detached or internal" (and induced weirdness
with d_absolute_path(), etc. callers deciding whether they want to
call ->d_dname() manually, since only d_path() calls it); I'll look into
that.

As for using this procfs vfsmount...  Frankly, I would rather add a mini-fs
a-la aio one and have those inodes live _there_, with the same vfsmount
used for all those guys.  Internal, of course.  Without bothering with
register_filesystem(), with all associated headache ("what if userland
mounts the entire thing", etc.)


More information about the Containers mailing list