[Bugme-new] [Bug 10938] New: lockd error causes system hang requiring server reboot

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Thu Jun 19 11:42:45 PDT 2008


http://bugzilla.kernel.org/show_bug.cgi?id=10938

           Summary: lockd error causes system hang requiring server reboot
           Product: File System
           Version: 2.5
     KernelVersion: 2.6.18-92.1.1.el5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: NFS
        AssignedTo: trond.myklebust at fys.uio.no
        ReportedBy: bryan.hockey at gmail.com


Note: Sorry for the mistakes, but the "Bug Filing FAQ" produced a 404
Latest working kernel version:
Earliest failing kernel version: earliest known: 2.6.18-92.1.1.el5
Distribution: Red Hat (also: Ubuntu)
Hardware Environment: Dell PowerEdge 2950 (dual Xeon, 8GB RAM), Dell PV220S
disk storage
Software Environment: RHEL 5.2
Problem Description:
Intermittently all clients will hang.  This can only be remedied with a server
reboot.  This seems to happen across different distros and different kernel
versions, hence the post here instead of to Red Hat's bugzilla.

Client- and server-side /var/log/messages both show:
lockd: server <ip> not responding, timed out

Restarting nfs on the server fails on "starting nfs daemon." Giving his error:
lockd_down: lockd failed to exist, clearing pid  
Doing this also causes a second instance of [lockd] to be running, where before
there was one.

Pinging and ssh'ing to the server continue to function throughout.

The bug seems to be a kernel issue, as it has appeared in different versions
across different kernels.  
This seems to be the same problem, in Ubuntu 2.6.22:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996
That page contains all necessary system messages and significant debugging
output, which I'm not going to bother re-posting here.  

Other perhaps related problems:
http://www.mail-archive.com/linux-nfs@vger.kernel.org/msg01373.html
https://bugzilla.redhat.com/show_bug.cgi?id=430160

Steps to reproduce:
According to "the.jxc" on that first link above, "The failure is very regular.
It happens whenever the garbage collection is performed as a result of a lock
request."  I can't be much more helpful than that.

Let me know if more information is needed, or if this is a duplicate of another
submission (my search produced no results).


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-new mailing list