[REVIEW][PATCH 1/3] vfs: In d_path don't call d_dname on a mount point

Eric W. Biederman ebiederm at xmission.com
Sat Nov 30 21:51:07 UTC 2013


Al Viro <viro at ZenIV.linux.org.uk> writes:

> Eric, what behaviour do you want there?

Here is the behavior I want.

a) Something reasonable in /proc/self/fd when I just open one of these.

root at unassigned:~# exec 5< /proc/self/ns/net
root at unassigned:~# ls -l /proc/self/fd
total 0
lrwx------ 1 root root 64 Nov 30 13:12 0 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:12 1 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:12 2 -> /dev/ttyS0
lr-x------ 1 root root 64 Nov 30 13:12 3 -> /proc/718/fd
lr-x------ 1 root root 64 Nov 30 13:12 5 -> net:[4026531955]
root at unassigned:~# ip netns add foo

b) Something reasonable in /proc/mounts when I bind mount one of these.

root at unassigned:~# ip netns add foo
root at unassigned:~# cat /proc/mounts 
[...]
proc /var/run/netns/foo proc rw,nosuid,nodev,noexec,relatime 0 0

The technical path that lead me to changing d_path looks something like this.

- I want bind mounts to work. So I need check_mnt to pass.
- I can't have these file descriptors show up directly in proc or useful
  permission checks will be bypassed.  Therefore d_unhashed must be true.
- I don't want every call of d_path to add (deleted) so I need
  d_unlinked to be false. Therefore S_ISROOT must be true.
- When looking at one of these file descriptors when they are not
  mounted prepend_path see that IS_ROOT is true and report /proc/ unless
  I implement d_name.
- When looking at one of these file descriptors when they are bind
  mounted I want /proc/mounts to reflect where they are mounted, which
  means I shouldn't call d_path.

Any dentry allocated with d_alloc_pseudo will have the same problem if
it is ever in a context where it can be bind mounted.  So this is a
general issue with d_name.

The other path to where I am at starts at this comment in prepend_path:
	/*
	 * Filesystems needing to implement special "root names"
	 * should do so with ->d_dname()
	 */
Which is exactly what I did only later to discover that d_path get's it
wrong if the dentry is mounted.

If I remove ns_dname I get the following:
root at unassigned:~# exec 5< /proc/self/ns/net 
root at unassigned:~# ls -l /proc/self/fd/
total 0
lrwx------ 1 root root 64 Nov 30 13:31 0 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:31 1 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:31 2 -> /dev/ttyS0
lr-x------ 1 root root 64 Nov 30 13:31 3 -> /proc/708/fd
lr-x------ 1 root root 64 Nov 30 13:31 5 -> /proc

Which is livable but particularly ugly to look at.

As for simple correctness.  There are only a handful of implementations
of d_dname today and the all set a valid path in the file descriptors
returned, in anything that can make it to d_path.  Which makes the test
for a psuedo dentry in d_path safe.  And at a very simply level we never
want to call d_dname on an actual mount point.

So in the small I will argue what I have done is very much correct.  In
the large I don't see any other set of implementation choices that does
not result in a significant redesign of something.

Furthermore I have a regression dating back a couple of kernels that
breaks /proc/mounts that should be resolved, and this patch is the
cleanest, safest, simplest solution I can see.

Eric







More information about the Containers mailing list