[Bugme-new] [Bug 16494] New: NFS client over TCP hangs due to packet loss
bugzilla-daemon at bugzilla.kernel.org
bugzilla-daemon at bugzilla.kernel.org
Mon Aug 2 09:14:46 PDT 2010
https://bugzilla.kernel.org/show_bug.cgi?id=16494
Summary: NFS client over TCP hangs due to packet loss
Product: Networking
Version: 2.5
Kernel Version: 2.6.34.1
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: IPV4
AssignedTo: shemminger at linux-foundation.org
ReportedBy: andyc.bluearc at gmail.com
Regression: No
If there's sufficient packet loss over a TCP connection from the NFS client
code to an NFS server (using NFS v3) that the RPC client code institutes
recovery by shutting down the connection and then reestablishing the
connection, then we see repeated connection setup and teardowns without any
intervening data packets:
4 42.909478 172.18.0.39 10.1.6.102 TCP 1013 > nfs [SYN] Seq=0
Win=5840 Len=0 MSS=1460 TSV=108490 TSER=0 WS=0
5 42.909577 10.1.6.102 172.18.0.39 TCP nfs > 1013 [SYN, ACK]
Seq=0 Ack=1 Win=64240 Len=0 MSS=1460
6 42.909610 172.18.0.39 10.1.6.102 TCP 1013 > nfs [ACK] Seq=1
Ack=1 Win=5840 Len=0
7 42.909672 172.18.0.39 10.1.6.102 TCP 1013 > nfs [FIN, ACK]
Seq=1 Ack=1 Win=5840 Len=0
8 42.909767 10.1.6.102 172.18.0.39 TCP nfs > 1013 [ACK] Seq=1
Ack=2 Win=64240 Len=0
9 43.660083 10.1.6.102 172.18.0.39 TCP nfs > 1013 [FIN, ACK]
Seq=1 Ack=2 Win=64240 Len=0
10 43.660100 172.18.0.39 10.1.6.102 TCP 1013 > nfs [ACK] Seq=2
Ack=2 Win=5840 Len=0
and then repeats after a while.
Here's a link to what I think the problem is: http://lkml.org/lkml/2010/7/27/42
Essentially, tcp_sendmsg is breaking out here as sk_shutdown contains
SEND_SHUTDOWN:
err = -EPIPE;
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
goto out_err;
Here's a patch that fixes the hang. It clears the sk_shutdown flag at
connection init time:
--- /home/company/software/src/linux-2.6.34.1/net/ipv4/tcp_output.c
2010-07-27 08:46:46.917000000 +0100
+++ net/ipv4/tcp_output.c 2010-07-27 09:19:16.000000000 +0100
@@ -2522,6 +2522,13 @@
struct tcp_sock *tp = tcp_sk(sk);
__u8 rcv_wscale;
+ /* clear down any previous shutdown attempts so that
+ * reconnects on a socket that's been shutdown leave the
+ * socket in a usable state (otherwise tcp_sendmsg() returns
+ * -EPIPE).
+ */
+ sk->sk_shutdown = 0;
+
/* We'll fix this up when we get a response from the other end.
* See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
*/
Whether that's the correct fix, I don't know.
At the time of writing, the current state of the thread in the LKML is here:
http://lkml.org/lkml/2010/7/29/120.
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the Bugme-new
mailing list