[Linux-kernel-mentees] [linux-kernel mentees][2]Syzbot report
Bharath Vedartham
linux.bhar at gmail.com
Sun Apr 28 17:36:33 UTC 2019
kernel BUG at include/linux/mm.h:LINE! (5)
This bug was in the open section.
Breif stack trace:
[ 172.788569] skb_release_all+0x4a/0x60
[ 172.789273] __kfree_skb+0x15/0x20
[ 172.789896] tcp_write_queue_purge+0x24f/0x7c0
[ 172.791100] tcp_disconnect+0x406/0x1890
[ 172.791999] ? lock_sock_nested+0xe2/0x120
[ 172.793116] tcp_close+0xe28/0x10a0
[ 172.794085] ? _raw_spin_unlock_bh+0x30/0x40
[ 172.795221] tls_sk_proto_close+0x3de/0x7b0
[ 172.796175] ? mark_held_locks+0x130/0x130
[ 172.797155] ? tcp_check_oom+0x560/0x560
[ 172.797939] ? tls_push_sg+0x6b0/0x6b0
[ 172.798628] ? ip_mc_drop_socket+0x210/0x270
[ 172.799381] inet_release+0x104/0x1f0
[ 172.800056] inet6_release+0x50/0x70
[ 172.800654] __sock_release+0xd7/0x2b0
[ 172.801283] ? __sock_release+0x2b0/0x2b0
[ 172.801957] sock_close+0x19/0x20
[ 172.802526] __fput+0x2cf/0x8b0
[ 172.803171] ____fput+0x15/0x20
[ 172.803747] task_work_run+0x14d/0x1c0
[ 172.804418] do_exit+0xb9f/0x3200
[ 172.804936] ? __lock_acquire+0x5d6/0x4760
[ 172.805567] ? mm_update_next_owner+0x6f0/0x6f0
[ 172.806275] ? find_held_lock+0x36/0x1d0
[ 172.806888] ? get_signal+0x300/0x1cc0
[ 172.807947] ? _raw_spin_unlock_irq+0x27/0x80
[ 172.808684] ? get_signal+0x300/0x1cc0
[ 172.809329] ? _raw_spin_unlock_irq+0x27/0x80
[ 172.810428] do_group_exit+0x135/0x370
[ 172.811384] get_signal+0x356/0x1cc0
[ 172.811985] ? __might_fault+0x12b/0x1e0
[ 172.812798] ? lock_downgrade+0x7f0/0x7f0
[ 172.813663] do_signal+0x87/0x1930
[ 172.814196] ? kasan_check_read+0x11/0x20
[ 172.814793] ? _copy_to_user+0xc8/0x110
[ 172.815714] ? setup_sigcontext+0x7d0/0x7d0
[ 172.816453] ? __x64_sys_futex+0x40d/0x5b0
[ 172.817168] ? exit_to_usermode_loop+0x40/0x2c0
[ 172.817951] ? do_syscall_64+0x536/0x600
[ 172.818700] ? exit_to_usermode_loop+0x40/0x2c0
[ 172.819618] ? lockdep_hardirqs_on+0x421/0x5c0
[ 172.820443] ? trace_hardirqs_on+0x67/0x230
[ 172.821257] exit_to_usermode_loop+0x241/0x2c0
[ 172.822058] do_syscall_64+0x536/0x600
[ 172.823116] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reproducer: A reproducer in C was present. I was able to reproduce the
bug easily. I had to link a few extra libraries like pthreads to compile
the reproducer. I had to enable tls(transport layer security) options in
my kernel config(5.1.0-rc6+) to be able to reproduce it. I figured this
out by observing the commit to which the crash was bisected.
Analysis: By the stack trace, I observed that the crash was triggered
somewhere in skb_release_data. The RIP register was pointing to
skb_release-data+0x5ae. skb_release is responsible for releasing data
from a socket buffer. Using GDB, I was able to figure out that
skb_release_data+0x5ae mapped to the function __skb_frag_unref. The
function takes as input a fragment of the data section of the socket
buffer and releases a reference to it. __skb_frag_unref calls put_page
to release a reference on the paged fragment. put_page in
/include/linux/mm.h:992 is triggered. put_page checks if the physical
page representing the fragment in physical memory has more than 0
references so as to release a reference using __put_page. In
put_page_testzero, if the page has 0 references on the entry of the
function, it triggers a crash(using VM_BUG_ON_PAGE). This means that one
of the data fragments of the socket buffer has zero references in memory
and is still a part of the socket buffer.
Fix: A fix for this would be to ignore the fragment for releasing if its
reference count of its struct page is 0. But I feel that this would not
be a wise idea. The fact that a fragment of the socket buffer data has
no references should pass quietly.
More information about the Linux-kernel-mentees
mailing list