[Bugme-new] [Bug 4404] New: NFS client hangs and does not umount (automounter, manual)

bugme-daemon at osdl.org bugme-daemon at osdl.org
Sat Mar 26 05:49:07 PST 2005


http://bugme.osdl.org/show_bug.cgi?id=4404

           Summary: NFS client hangs and does not umount (automounter,
                    manual)
    Kernel Version: 2.6.11
            Status: NEW
          Severity: normal
             Owner: trond.myklebust at fys.uio.no
         Submitter: wagner at tik.ee.ethz.ch


Distribution: Debian testing
Hardware Environment: Tyan dual Athlon MP 2800+
Software Environment:
Problem Description: 

When 'mv'-ing larger files over NFS, some servers become unresponsve, but on the 
client side. The exported filesystems can still be munted on other computers 
with 2.6.11 without problem. Even a reboot of the NFS server does not clean up 
the problem, only client reboot helps.

Details:

I have a cluster with 22 nodes and 2 servers. The nodes are running an older 
copy of debian testing (about 6 months old) and kernel 2.4.26-om1 (OpenMosix).
The nodes all export one partition with 
  /aux  10.0.0.0/24(rw,async,no_root_squash)
The servers are one dual CPU and one single CPU machine and mount the
exported node partition via the automounter. They also mount two partitions
from each other.

One probem I experience is that when I do several 'mv' for several hundred files 
with ~400MB each from the server to several nodes some of these 'mv' enter a 
permanent sleep state after having copied an arbitrary number of files. Simple 
Ctrl-C and restart cures that but it is anoying. The more serious problem is 
that sometimes after these copying activities (may also be unrelated, but I 
cannot tell) some of the node exported partition become unresponsive (df, ls, 
etc.) and cannot be umounted. This seems not to be a node problem, since the 
other server can still access the node-exports fine. Rebooting the affected 
nodes also does not help. In addition it seems I cannot mount any currently 
unmounted node exports on the affected server when that happens. Restarting NFS, 
portmapper or networking has no effect. Only rebooting the affected server 
helps. This is with NFSv3 support, I am currently running tests with only NFSv2.

In addition it seems (but I cannot really be sure) than only the dual-CPU server 
is affected. 

What I would like is some hints on how to debug this. There are no log-entries. 
There are no other signs of the problem I could find, except that it does not 
work. 'df'/'ls' hang when trying to query the affected mounts, but they do not 
seem to time-out. They can be interrupted with Ctrl-C. 'mount' displays all 
mounts. 'mount -f' has no effect, with the debian 'mount' and with a self-
compiled one from the latest sources (util-linux-2.12q).

Steps to reproduce: No idea.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Bugme-new mailing list