[Bugme-new] [Bug 9124] New: Netconsole race crashed the system
bugme-daemon at bugzilla.kernel.org
bugme-daemon at bugzilla.kernel.org
Thu Oct 4 16:24:18 PDT 2007
http://bugzilla.kernel.org/show_bug.cgi?id=9124
Summary: Netconsole race crashed the system
Product: Networking
Version: 2.5
KernelVersion: 2.6.9, 2.6.18, 2.6.23
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: Other
AssignedTo: acme at ghostprotocols.net
ReportedBy: tina.yang at oracle.com
Most recent kernel where this bug did not occur:
Think the problem has always been there.
Distribution:
Hardware Environment:
DELL PowerEdge 2650 (x86)
DELL PowerEdge 2850(x86_64)
HP ProLiant DL380 G5 (x86_64)
with various NICs - e1000, tg3, bnx2
Software Environment:
2.6.9, 2.6.18, 2.6.23
Problem Description:
On 2.6.18 found this issue on e1000 and tg3. On mainline 2.6.23-rc* found this
issue on e100,tgs and bnx2. It either panicked
at netdevice.h:890 or hung the system, and sometimes depending
on which NIC are used, the following console message,
e1000:
"e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang"
tg3:
"NETDEV WATCHDOG: eth4: transmit timed out"
"tg3: eth4: transmit timed out, resetting"
Steps to reproduce:
1. On 2.6.18 (both x86_x86_64) insert netconsole module.(NIC: e1000 and tg3)
2. Run a moderate io load , preferably fio - one process doing async+directIO
using libaio
fio jobfile:
[global]
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1024m
directory=/home/oracle
numjobs=2
[job1]
bs=8k
direct=1
ioengine=libaio
rw=randrw
filename=file1:file2
3. From second console as root do " echo t > /proc/sysrq-trigger"
Machine will instantly hang.
Crash stack captured on 2.6.9
PANIC: "kernel BUG at include/linux/netdevice.h:888!"
#0 [ 23c5e60] disk_dump at f9ca71a2
#1 [ 23c5e64] printk at 21228d6
#2 [ 23c5e70] freeze_other_cpus at f9ca6ef5
#3 [ 23c5e80] start_disk_dump at f9ca6fa0
#4 [ 23c5e90] try_crashdump at 2133766
#5 [ 23c5e98] die at 2106354
#6 [ 23c5ecc] do_invalid_op at 210672f
#7 [ 23c5f7c] error_code (via invalid_op) at fffecede
EAX: 00000006 EBX: 00200202 ECX: 00000000 EDX: df287000 EBP: e05ca000
DS: 007b ESI: 00000001 ES: 007b EDI: e05ca240
CS: 0060 EIP: f8c82a08 ERR: ffffffff EFLAGS: 00210046
#8 [ 23c5fb8] tg3_poll at f8c82a08
#9 [ 23c5fd0] net_rx_action at 227a8da
#10 [ 23c5fe8] __do_softirq at 2126422
--- <soft IRQ> ---
#0 [25c71cac] do_softirq at 2108460
#1 [25c71cb4] dev_queue_xmit at 227a0d2
#2 [25c71ccc] ip_finish_output at 229288d
#3 [25c71ce4] ip_queue_xmit at 2292fa9
#4 [25c71dac] tcp_transmit_skb at 22a0ff7
#5 [25c71dec] tcp_write_xmit at 22a1901
#6 [25c71e10] tcp_sendmsg at 2297d6d
#7 [25c71e80] sock_aio_write at 2272512
#8 [25c71eec] do_sync_write at 215a444
#9 [25c71f88] vfs_write at 215a53a
#10 [25c71fa4] sys_write at 215a5f4
#11 [25c71fc0] system_call at fffec219
net_device in memory,
name = "eth0\000\000\000\000\000\000\000\000\000\000\000",
mem_end = 0,
mem_start = 0,
base_addr = 0,
irq = 209,
if_port = 0 '\0',
dma = 0 '\0',
state = 6,
next = 0xbf41b000,
init = 0,
next_sched = 0x0,
ifindex = 2,
iflink = 2,
get_stats = 0xf8c87737,
get_wireless_stats = 0,
wireless_handlers = 0x0,
ethtool_ops = 0xf8c964e0,
trans_start = 128269465,
last_rx = 128269464,
flags = 4099,
gflags = 0,
priv_flags = 32,
unused_alignment_fixer = 0,
mtu = 1500,
type = 1,
hard_header_len = 14,
priv = 0xbf430240,
master = 0x0,
broadcast =
"<FF><FF><FF><FF><FF><FF>\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000",
dev_addr = "\000\tk<E6>g<EB>\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000
\000\000\000\000\000\000\000\000",
addr_len = 6 '\006',
reserved = 0 '\0',
priv_len = 1980,
mc_list = 0x15f48440,
mc_count = 1,
promiscuity = 0,
allmulti = 0,
watchdog_timeo = 5000,
watchdog_timer = {
entry = {
next = 0x1594af48,
prev = 0x1594af48
},
expires = 128269531,
lock = {
lock = 1,
magic = 3735899821
},
magic = 1267182958,
function = 0x2286c74 <dev_watchdog>,
data = 3208839168,
base = 0x1594a860
},
atalk_ptr = 0x0,
ip_ptr = 0xc1e7de80,
dn_ptr = 0x0,
ip6_ptr = 0x0,
ec_ptr = 0x0,
ax25_ptr = 0x0,
poll_list = {
next = 0x100100,
prev = 0x200200
},
...
Crash stack captured on 2.6.18
PANIC: "kernel BUG at include/linux/netdevice.h:890!"
#0 [c072ce30] crash_kexec at c044418a
#1 [c072ce74] die at c04054d0
#2 [c072cea4] do_invalid_op at c0405c20
#3 [c072cf54] error_code (via invalid_op) at c0404ab3
EAX: 00000007 EBX: 00000202 ECX: 00000000 EDX: f6d9c000 EBP: f6d9c400
DS: 007b ESI: 00000001 ES: 007b EDI: cb02b280
CS: 0060 EIP: f8927791 ERR: ffffffff EFLAGS: 00010046
#4 [c072cf88] tg3_poll at f8927791
--- <soft IRQ> ---
#0 [f7e54f60] do_softirq at c0406433
#1 [f7e54f6c] do_IRQ at c0406425
#2 [f7e54fb4] cpu_idle at c0402c8e
net_device in memory,
name = "eth4\000\000\000\000\000\000\000\000\000\000\000",
name_hlist = {
next = 0x0,
pprev = 0xc07d0148
},
mem_end = 0,
mem_start = 0,
base_addr = 0,
irq = 201,
if_port = 0 '\0',
dma = 0 '\0',
state = 39,
next = 0xf7387000,
init = 0,
features = 419,
next_sched = 0x0,
ifindex = 2,
iflink = 2,
get_stats = 0xf892016b <tg3_get_stats>,
get_wireless_stats = 0,
wireless_handlers = 0x0,
wireless_data = 0x0,
cfg80211_wext_pending_config = 0x0,
ethtool_ops = 0xf89301a0,
flags = 4099,
priv_flags = 0,
padded = 0,
operstate = 6 '\006',
link_mode = 0 '\0',
mtu = 1500,
type = 1,
hard_header_len = 14,
master = 0x0,
perm_addr =
"\000\021C5\033\004\000\000\000\000\000\000\000\000\000\000\000\000\000\000\
000\000\000\000\000\000\000\000\000\000\000",
addr_len = 6 '\006',
dev_id = 0,
mc_list = 0xf59f0ac0,
mc_count = 5,
promiscuity = 0,
allmulti = 0,
atalk_ptr = 0x0,
ip_ptr = 0xcb308280,
dn_ptr = 0x0,
ip6_ptr = 0xf71e5c00,
ec_ptr = 0x0,
ax25_ptr = 0x0,
ieee80211_ptr = 0x0,
poll_list = {
next = 0xcb0232a0,
prev = 0xcb0232a0
},
...
--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Bugme-new
mailing list