[PATCH review 11/11] mnt: Honor MNT_LOCKED when detaching mounts
viro at ZenIV.linux.org.uk
Thu Jan 8 22:32:13 UTC 2015
On Thu, Jan 08, 2015 at 12:22:27AM +0000, Al Viro wrote:
> * external references countribute
> * mnt_parent contributes unless it points to ourselves *or* mnt_ns is
> * reachable from mount hash => add 1
> * in addition to the wart around namespace_unlock(), we have a similar
> wart between mntput_no_expire() and cleanup_mnt(), only there we thread the
> suckers on mnt_child instead of mnt_list and use slightly different logics to
> prevent shutdown of parent fs before the dput(mountpoint).
BTW, there's something I wanted to do for a while - these games with
delaying dput(mountpoint) look a whole lot like a job for mnt_pin_kill().
Look: even now we have (at the exit from umount_tree()) a bunch of remnant
mountpoints associated with ex-parents. And we want to have them taken
out and shot before said ex-parents shut down. The tricky part here is
ordering - at the beginning of namespace_unlock() we have a bunch of
mounts about to be dropped; quite a few might be keeping references to
the ex-mountpoints. We want mntput() on all of those *and* we want
dput() on said ex-mountpoints. We'd already dropped the references to
ex-parents (in umount_tree()). The trouble being, some of those ex-mountpoints
might be on other victim mounts and we really do NOT want to drop such a victim
until all dput() within it had been done.
That's why we play those games with bumping ex-parent mount counts first,
so that dput() is immediately followed with matching mntput().
But we don't have to do it that way. Alternative, and IMO a cleaner one,
would be to embed struct fs_pin into struct mount and, in umount_tree()
simply add that fs_pin->m_list into the ->mnt_pins of parent and ->s_list
into a global list (replacement of 'unmounted'). With ->kill() callback
dropping references to dentry of mountpoint *and* containing mount. Then
we don't need to care about the ordering at all - we just go through the
global list of fs_pin and do ->kill() on all of them, in whatever order they
happen to be. If mount gets dropped while there still are ex-mountpoints on
it waiting to be dropped - fine, we'll just drop them there and then (from
mnt_kill_pins()). No games with refcounts that way, no transiently busy
mounts, which we do have right now.
And that scheme actually covers your "delayed detaching" just fine - if we
want detaching to be delayed until the parent really buys it, all we need to
do is to skip dropping the child from mnt_mounts of parent and putting fs_pin
of child on the global list. And detach_mount() on all remaining children
at mntput_no_expire() time, as you do, only with removal from child list.
No changes needed in cleanup_mnt() that way - remaining fs_pin of children
will play out, yielding the effect you want.
The problem with that approach (and the reason why it hadn't been done yet)
is that we'll need some careful tweaking of locking in fs/fs_pin.c. Back
in 3.17 I really wanted to get the kernel/acct.c crap out of the way, so that
umount-on-rmdir stuff could be done safely. So the minimal solution went in,
with plans for using it to avoid the headache with ordering left for later.
I had completely missed the possibility to do less eager dissolving of
detached trees that way (and it was a long-standing wishlist thing) back
then, or I probably would've bitten the bullet and gone for it.
What you are proposing is
a) easily expressed with that scheme (behaviour is almost identical
to what your series does, except that we'd empty mnt_child right in
mntput_no_expire(), rather than keeping it until cleanup_mnt()) and
b) easily generalized to MNT_DETACH with lazy dissolving; in fact,
the only difference would be to treat *all* submounts as "don't put on
global list, don't remove from child list", not just the MNT_LOCKED ones.
I'll play around with that today and tomorrow; hopefully, I'll have a postable
variant by the weekend...
More information about the Containers