[cgl_discussion] Re: OSDL CGL-WG draft specs available for review

Mark Huth mark.huth at mvista.com
Thu Apr 24 11:24:36 PDT 2003


I looked at Tigran's implementation, and it was a decent job - the 
algorithm used was to install a filesystem to handle request to the 
unmounted file system by returning an error.  He then looped through the 
task list, removing file descriptors that references the subject file 
system, removing working directory references and removing mmaps.  At 
the end, the file system should have the correct reference count for 
unmounting.  The main issue is that I was unclear if new request could 
occur after the list was walked, potentially causing the unmount not to 
occur.  The other issue is scalability with large numbers of tasks, perhaps.

My implementation is a bottom up one, with the unmount process first 
"walling off" system calls that could create new references.  The mount 
point is marked as fumount pending, and subsequent syscalls that would 
use a file descriptor would have the fget fail with a NULL file object, 
leading the calls to return -EBADF.  If the syscall used a file name 
string, then the namei lookups would fail, generally returning -ENXIO. 
 Then there is a pause, the intention of which is to allow contexts in 
the kernel to clear out - such a pending reads.  Following that, the 
file objects are marked as subject to fumount, and then outstanding 
references to the file objects are cancelled - locks are removed and 
mmaps deleted.  If there remain outstanding file references, the file 
object is cloned, leaving the old object with null operations. Following 
that, only a close will succeed.  The cloned object is taken over by the 
umount process, and closed however many times required to drive the 
reference count to 0.  Once the super block file list is emptied, the 
mount reference count is checked.  If still not at the magic number, 
then the task list is walked, looking for cwd entries that are 
subordinate to the fs mount point.  If found, the task is given a NULL 
cwd.  Various parts of the lookup routines have had error handling added 
so that the NULL cwd entry does not cause problems.  After that, the 
mount ref count is checked again.  If still not the magic number, then 
an arbitrary mntput is done, potentailly losing resources.  However, 
that has not happened in our testing.  It's complicated, but seems solid 
after our testing.  There are a couple of rules that the sysadmin should 
follow:  Any nfs export stuff subordinate to the mount point should be 
unexported before fumount - I can't find non-process related kernel 
references, and NFS is the only entity that might pose a problem there - 
it's stateless, so the problem is transitory, but nonetheless, I 
recommend removing the export. The second rule is that if the mount 
point has subordinate mounts, these must be removed first by the admin - 
I chose not to allow the unmount to automatically recurse.  Finally, the 
filesystem cannot be the root filesystem, although that is arbitrary as 
far as the code is concerned - fumount checks for it and doesn't do the 
forced unmount if the subject is /

This was messy and took a while to get right, primarily due to the 
Linux's ill-defined locking paradigm.  Lists of objects are locked, 
while the objects themselves only have reference counts.  I was unaware 
of Tigran's implementation at the start of the project.  Looking at it, 
there are some things I really like, so a combination of the two might 
be the best implementation.

Mark Huth

Carl-Daniel Hailfinger wrote:

>[CC:ed Mark Huth and Tigran Aivazian]
>Christoph Hellwig wrote:
>>   4.10 Force unmount (2) 2 Experimental Availability Core
>>   4.10 Description: 
>>   CGL shall support forced unmounting of a filesystem.
>>     * The  unmount should work even if there are open files or processes
>>       in the file system.
>>     * Pending  requests  should  be  ended with an error return when the
>>       file system is unmounted.
>>This is very hard to get right.  What the expermintel implementation
>>you're referring to?
>IIRC, Mark Huth from MontaVista and Tigran Aivazian from Veritas both
>developed such an implementation independently of each other.
>Maybe they can offer some insight.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/cgl_discussion/attachments/20030424/b25258ac/attachment-0001.htm

More information about the cgl_discussion mailing list