[REVIEW][PATCH 1/3] vfs: In d_path don't call d_dname on a mount point
Eric W. Biederman
ebiederm at xmission.com
Sat Nov 30 21:51:07 UTC 2013
Al Viro <viro at ZenIV.linux.org.uk> writes:
> Eric, what behaviour do you want there?
Here is the behavior I want.
a) Something reasonable in /proc/self/fd when I just open one of these.
root at unassigned:~# exec 5< /proc/self/ns/net
root at unassigned:~# ls -l /proc/self/fd
total 0
lrwx------ 1 root root 64 Nov 30 13:12 0 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:12 1 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:12 2 -> /dev/ttyS0
lr-x------ 1 root root 64 Nov 30 13:12 3 -> /proc/718/fd
lr-x------ 1 root root 64 Nov 30 13:12 5 -> net:[4026531955]
root at unassigned:~# ip netns add foo
b) Something reasonable in /proc/mounts when I bind mount one of these.
root at unassigned:~# ip netns add foo
root at unassigned:~# cat /proc/mounts
[...]
proc /var/run/netns/foo proc rw,nosuid,nodev,noexec,relatime 0 0
The technical path that lead me to changing d_path looks something like this.
- I want bind mounts to work. So I need check_mnt to pass.
- I can't have these file descriptors show up directly in proc or useful
permission checks will be bypassed. Therefore d_unhashed must be true.
- I don't want every call of d_path to add (deleted) so I need
d_unlinked to be false. Therefore S_ISROOT must be true.
- When looking at one of these file descriptors when they are not
mounted prepend_path see that IS_ROOT is true and report /proc/ unless
I implement d_name.
- When looking at one of these file descriptors when they are bind
mounted I want /proc/mounts to reflect where they are mounted, which
means I shouldn't call d_path.
Any dentry allocated with d_alloc_pseudo will have the same problem if
it is ever in a context where it can be bind mounted. So this is a
general issue with d_name.
The other path to where I am at starts at this comment in prepend_path:
/*
* Filesystems needing to implement special "root names"
* should do so with ->d_dname()
*/
Which is exactly what I did only later to discover that d_path get's it
wrong if the dentry is mounted.
If I remove ns_dname I get the following:
root at unassigned:~# exec 5< /proc/self/ns/net
root at unassigned:~# ls -l /proc/self/fd/
total 0
lrwx------ 1 root root 64 Nov 30 13:31 0 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:31 1 -> /dev/ttyS0
lrwx------ 1 root root 64 Nov 30 13:31 2 -> /dev/ttyS0
lr-x------ 1 root root 64 Nov 30 13:31 3 -> /proc/708/fd
lr-x------ 1 root root 64 Nov 30 13:31 5 -> /proc
Which is livable but particularly ugly to look at.
As for simple correctness. There are only a handful of implementations
of d_dname today and the all set a valid path in the file descriptors
returned, in anything that can make it to d_path. Which makes the test
for a psuedo dentry in d_path safe. And at a very simply level we never
want to call d_dname on an actual mount point.
So in the small I will argue what I have done is very much correct. In
the large I don't see any other set of implementation choices that does
not result in a significant redesign of something.
Furthermore I have a regression dating back a couple of kernels that
breaks /proc/mounts that should be resolved, and this patch is the
cleanest, safest, simplest solution I can see.
Eric
More information about the Containers
mailing list