[Bugme-new] [Bug 4404] New: NFS client hangs and does not umount
(automounter, manual)
bugme-daemon at osdl.org
bugme-daemon at osdl.org
Sat Mar 26 05:49:07 PST 2005
http://bugme.osdl.org/show_bug.cgi?id=4404
Summary: NFS client hangs and does not umount (automounter,
manual)
Kernel Version: 2.6.11
Status: NEW
Severity: normal
Owner: trond.myklebust at fys.uio.no
Submitter: wagner at tik.ee.ethz.ch
Distribution: Debian testing
Hardware Environment: Tyan dual Athlon MP 2800+
Software Environment:
Problem Description:
When 'mv'-ing larger files over NFS, some servers become unresponsve, but on the
client side. The exported filesystems can still be munted on other computers
with 2.6.11 without problem. Even a reboot of the NFS server does not clean up
the problem, only client reboot helps.
Details:
I have a cluster with 22 nodes and 2 servers. The nodes are running an older
copy of debian testing (about 6 months old) and kernel 2.4.26-om1 (OpenMosix).
The nodes all export one partition with
/aux 10.0.0.0/24(rw,async,no_root_squash)
The servers are one dual CPU and one single CPU machine and mount the
exported node partition via the automounter. They also mount two partitions
from each other.
One probem I experience is that when I do several 'mv' for several hundred files
with ~400MB each from the server to several nodes some of these 'mv' enter a
permanent sleep state after having copied an arbitrary number of files. Simple
Ctrl-C and restart cures that but it is anoying. The more serious problem is
that sometimes after these copying activities (may also be unrelated, but I
cannot tell) some of the node exported partition become unresponsive (df, ls,
etc.) and cannot be umounted. This seems not to be a node problem, since the
other server can still access the node-exports fine. Rebooting the affected
nodes also does not help. In addition it seems I cannot mount any currently
unmounted node exports on the affected server when that happens. Restarting NFS,
portmapper or networking has no effect. Only rebooting the affected server
helps. This is with NFSv3 support, I am currently running tests with only NFSv2.
In addition it seems (but I cannot really be sure) than only the dual-CPU server
is affected.
What I would like is some hints on how to debug this. There are no log-entries.
There are no other signs of the problem I could find, except that it does not
work. 'df'/'ls' hang when trying to query the affected mounts, but they do not
seem to time-out. They can be interrupted with Ctrl-C. 'mount' displays all
mounts. 'mount -f' has no effect, with the debian 'mount' and with a self-
compiled one from the latest sources (util-linux-2.12q).
Steps to reproduce: No idea.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Bugme-new
mailing list