[Bugme-new] [Bug 11433] New: CONFIG_NO_HZ causes ksoftirqd to load one cpu by 8% even when idle.
bugme-daemon at bugzilla.kernel.org
bugme-daemon at bugzilla.kernel.org
Wed Aug 27 06:42:17 PDT 2008
http://bugzilla.kernel.org/show_bug.cgi?id=11433
Summary: CONFIG_NO_HZ causes ksoftirqd to load one cpu by 8% even
when idle.
Product: Process Management
Version: 2.5
KernelVersion: 2.6.26.3
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
AssignedTo: process_other at kernel-bugs.osdl.org
ReportedBy: sc.contact at gmail.com
Latest working kernel version: unknown
Earliest failing kernel version: unknown
Distribution: Gentoo
Hardware Environment: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Software Environment: GCC 4.3.1 GLIBC 2.8
Problem Description:
ksoftirqd loads one cpu at 8% even when the machine is idle only when dynticks
is enabled.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10 root 15 -5 0 0 0 S 8 0.0 1:44.89 ksoftirqd/3
14:25:53 up 22 min, 2 users, load average: 0.04, 0.06, 0.01
Although this is a grsecurity patched kernel, I have verified this behaviour
with a vanilla kernel.
Witout dynticks:
quad zakalwe # opreport -l /usr/src/linux-2.6.26.3-grsec/vmlinux
CPU: Core 2, speed 1603 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 1201500
samples % symbol name
162 20.3774 mwait_idle
76 9.5597 apic_timer_interrupt
74 9.3082 native_read_tsc
34 4.2767 slab_pad_check
28 3.5220 getnstimeofday
21 2.6415 check_bytes_and_report
17 2.1384 native_sched_clock
16 2.0126 __update_sched_clock
16 2.0126 run_timer_softirq
14 1.7610 __do_softirq
14 1.7610 tick_sched_timer
11 1.3836 find_busiest_group
10 1.2579 scheduler_tick
9 1.1321 hrtimer_interrupt
9 1.1321 ktime_get_ts
9 1.1321 page_fault
9 1.1321 smp_apic_timer_interrupt
8 1.0063 sched_clock_tick
7 0.8805 _local_bh_enable
7 0.8805 do_softirq
7 0.8805 raise_softirq
7 0.8805 read_tsc
6 0.7547 hrtimer_run_pending
6 0.7547 sched_clock_cpu
6 0.7547 update_wall_time
5 0.6289 hrtimer_run_queues
5 0.6289 init_object
5 0.6289 notifier_call_chain
5 0.6289 on_freelist
5 0.6289 profile_tick
4 0.5031 __run_hrtimer
4 0.5031 _spin_lock
4 0.5031 clear_page_c
4 0.5031 find_lock_page
4 0.5031 hrtimer_forward
4 0.5031 in_lock_functions
4 0.5031 ktime_get
4 0.5031 lapic_next_event
4 0.5031 list_del
4 0.5031 restore_args
4 0.5031 update_process_times
3 0.3774 __round_jiffies
3 0.3774 account_system_time_scaled
3 0.3774 check_slab
3 0.3774 clockevents_program_event
3 0.3774 irq_enter
3 0.3774 irq_exit
3 0.3774 mce_idle_callback
3 0.3774 rb_next
3 0.3774 rcu_pending
3 0.3774 run_rebalance_domains
3 0.3774 schedule
3 0.3774 unmap_vmas
3 0.3774 update_vsyscall
2 0.2516 __first_cpu
2 0.2516 __link_path_walk
2 0.2516 _atomic_dec_and_lock
2 0.2516 copy_page_range
2 0.2516 copy_user_generic_string
2 0.2516 cpu_idle
2 0.2516 do_page_fault
2 0.2516 enter_idle
2 0.2516 filemap_fault
2 0.2516 find_first_bit
2 0.2516 find_next_bit
2 0.2516 get_page_from_freelist
2 0.2516 mutex_lock
2 0.2516 page_remove_rmap
2 0.2516 rb_insert_color
2 0.2516 thread_return
1 0.1258 __chk_obj_label
1 0.1258 __dec_zone_page_state
1 0.1258 __delay
1 0.1258 __dequeue_entity
1 0.1258 __dequeue_signal
1 0.1258 __find_get_block
1 0.1258 __list_add
1 0.1258 __pagevec_free
1 0.1258 __remove_hrtimer
1 0.1258 __slab_free
1 0.1258 __switch_to
1 0.1258 __up_read
1 0.1258 __wake_up_bit
1 0.1258 account_process_tick
1 0.1258 account_system_time
1 0.1258 add_event_entry
1 0.1258 any_ports_active
1 0.1258 check_object
1 0.1258 con_chars_in_buffer
1 0.1258 copy_page_c
1 0.1258 cp_new_stat
1 0.1258 deactivate_task
1 0.1258 dequeue_entity
1 0.1258 dequeue_task_fair
1 0.1258 do_dbs_timer
1 0.1258 do_mmap_pgoff
1 0.1258 do_select
1 0.1258 down_write
1 0.1258 elv_rb_find
1 0.1258 enqueue_hrtimer
1 0.1258 exit_idle
1 0.1258 ext3_get_group_desc
1 0.1258 find_vma
1 0.1258 find_vma_prev
1 0.1258 finish_task_switch
1 0.1258 flush_tlb_page
1 0.1258 fput
1 0.1258 free_hot_cold_page
1 0.1258 free_pages_and_swap_cache
1 0.1258 generic_file_aio_read
1 0.1258 handle_mm_fault
1 0.1258 ioread8
1 0.1258 kfree
1 0.1258 locks_remove_posix
1 0.1258 lookup_mnt
1 0.1258 memory_open
1 0.1258 mousedev_event
1 0.1258 mousedev_notify_readers
1 0.1258 next_zones_zonelist
1 0.1258 page_add_file_rmap
1 0.1258 page_waitqueue
1 0.1258 prepend
1 0.1258 radix_tree_lookup
1 0.1258 rb_erase
1 0.1258 read_inode_bitmap
1 0.1258 release_pages
1 0.1258 ret_from_intr
1 0.1258 run_posix_cpu_timers
1 0.1258 run_workqueue
1 0.1258 set_normalized_timespec
1 0.1258 skb_release_data
1 0.1258 sock_alloc_send_skb
1 0.1258 task_rq_lock
1 0.1258 task_tick_idle
1 0.1258 timer_stats_update_stats
1 0.1258 uhci_check_ports
1 0.1258 uhci_hub_status_data
1 0.1258 uhci_scan_schedule
1 0.1258 update_curr
1 0.1258 usecs_to_jiffies
1 0.1258 vfs_getattr
1 0.1258 vma_adjust
With Dynticks
quad zakalwe # opreport -l /usr/src/linux-2.6.26.3-grsec/vmlinux
CPU: Core 2, speed 1603 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 1201500
samples % symbol name
3668 16.9017 __switch_to
2028 9.3448 schedule
1996 9.1973 native_read_tsc
1563 7.2021 find_busiest_group
767 3.5342 finish_task_switch
723 3.3315 native_sched_clock
709 3.2670 thread_return
662 3.0504 sched_clock_cpu
472 2.1749 __update_sched_clock
447 2.0597 tick_nohz_stop_sched_tick
409 1.8846 hrtick_set
401 1.8478 hrtick_start_fair
386 1.7786 read_tsc
332 1.5298 ksoftirqd
318 1.4653 __do_softirq
295 1.3593 task_rq_lock
265 1.2211 __resched_task
253 1.1658 enqueue_entity
250 1.1520 getnstimeofday
238 1.0967 do_softirq
231 1.0644 find_first_bit
222 1.0229 pick_next_task_fair
220 1.0137 find_next_bit
220 1.0137 get_next_timer_interrupt
217 0.9999 tick_nohz_restart_sched_tick
212 0.9769 update_curr
211 0.9723 select_task_rq_fair
210 0.9677 set_next_entity
178 0.8202 __next_cpu
169 0.7787 try_to_wake_up
166 0.7649 __first_cpu
161 0.7419 rb_insert_color
144 0.6635 dequeue_entity
144 0.6635 rb_erase
136 0.6267 __enqueue_entity
130 0.5990 put_prev_task_fair
115 0.5299 hrtimer_run_pending
110 0.5069 cpu_idle
99 0.4562 enqueue_task_fair
99 0.4562 run_timer_softirq
97 0.4470 deactivate_task
96 0.4424 ktime_get
96 0.4424 rb_next
92 0.4239 _spin_unlock_irqrestore
90 0.4147 __dequeue_entity
87 0.4009 tick_nohz_stop_idle
86 0.3963 call_softirq
84 0.3871 _local_bh_enable
83 0.3825 __list_add
82 0.3778 kthread_should_stop
80 0.3686 place_entity
77 0.3548 activate_task
74 0.3410 enqueue_task
68 0.3133 dequeue_task_fair
68 0.3133 rcu_pending
67 0.3087 ktime_get_ts
65 0.2995 set_normalized_timespec
55 0.2534 msecs_to_jiffies
51 0.2350 _cond_resched
49 0.2258 pick_next_task_rt
46 0.2120 raise_softirq_irqoff
44 0.2027 rcu_needs_cpu
40 0.1843 slab_pad_check
39 0.1797 check_preempt_curr_idle
37 0.1705 wake_up_process
34 0.1567 pick_next_task_idle
33 0.1521 put_prev_task_idle
32 0.1475 list_add
22 0.1014 check_bytes_and_report
22 0.1014 mwait_idle
18 0.0829 apic_timer_interrupt
9 0.0415 copy_user_generic_string
8 0.0369 init_object
7 0.0323 on_freelist
6 0.0276 page_fault
5 0.0230 __slab_alloc
5 0.0230 add_event_entry
5 0.0230 smp_apic_timer_interrupt
5 0.0230 sync_buffer
5 0.0230 update_wall_time
4 0.0184 __slab_free
4 0.0184 kmem_cache_free
4 0.0184 sched_clock_tick
4 0.0184 scheduler_tick
3 0.0138 __run_hrtimer
3 0.0138 account_system_time
3 0.0138 clear_page_c
3 0.0138 flush_tlb_page
3 0.0138 irq_enter
3 0.0138 raise_softirq
3 0.0138 select_nohz_load_balancer
3 0.0138 unix_poll
2 0.0092 __d_lookup
2 0.0092 __find_get_block
2 0.0092 __journal_temp_unlink_buffer
2 0.0092 __wake_up
2 0.0092 _spin_lock
2 0.0092 br_hello_timer_expired
2 0.0092 check_object
2 0.0092 check_slab
2 0.0092 clocksource_get_next
2 0.0092 copy_page_c
2 0.0092 do_page_fault
2 0.0092 find_lock_page
2 0.0092 idle_cpu
2 0.0092 memcmp
2 0.0092 mmap_region
2 0.0092 path_put
2 0.0092 release_pages
2 0.0092 run_rebalance_domains
2 0.0092 system_call_after_swapgs
2 0.0092 tick_nohz_update_jiffies
2 0.0092 tick_sched_timer
2 0.0092 update_process_times
2 0.0092 update_vsyscall
2 0.0092 vsnprintf
1 0.0046 __alloc_pages_internal
1 0.0046 __blk_put_request
1 0.0046 __do_fault
1 0.0046 __down_write_nested
1 0.0046 __find_get_block_slow
1 0.0046 __fput
1 0.0046 __inet_lookup_established
1 0.0046 __mod_timer
1 0.0046 __pte_alloc
1 0.0046 __round_jiffies
1 0.0046 __timer_stats_hrtimer_set_start_info
1 0.0046 __up_write
1 0.0046 __wake_up_sync
1 0.0046 add_dirent_to_buf
1 0.0046 any_ports_active
1 0.0046 br_config_bpdu_generation
1 0.0046 br_transmit_config
1 0.0046 copy_page_range
1 0.0046 do_dbs_timer
1 0.0046 do_get_write_access
1 0.0046 do_notify_resume
1 0.0046 do_wait
1 0.0046 do_wp_page
1 0.0046 dput
1 0.0046 enqueue_hrtimer
1 0.0046 exit_itimers
1 0.0046 ext3_get_blocks_handle
1 0.0046 ext3_journal_start_sb
1 0.0046 ext3_mark_iloc_dirty
1 0.0046 filemap_fault
1 0.0046 find_vma
1 0.0046 free_pgd_range
1 0.0046 get_pid_task
1 0.0046 get_slab
1 0.0046 get_user_pages
1 0.0046 handle_mm_fault
1 0.0046 hrtick_resched
1 0.0046 hrtimer_get_next_event
1 0.0046 init_timer
1 0.0046 inotify_inode_queue_event
1 0.0046 ioread8
1 0.0046 iowrite8
1 0.0046 journal_put_journal_head
1 0.0046 journal_start
1 0.0046 lock_timer_base
1 0.0046 mutex_lock
1 0.0046 notifier_call_chain
1 0.0046 page_waitqueue
1 0.0046 path_get
1 0.0046 proc_sys_lookup_table_one
1 0.0046 radix_tree_insert
1 0.0046 rebalance_domains
1 0.0046 remove_vma
1 0.0046 ret_from_intr
1 0.0046 run_posix_cpu_timers
1 0.0046 sched_balance_self
1 0.0046 scsi_sg_free
1 0.0046 strncpy_from_user
1 0.0046 task_tick_idle
1 0.0046 tick_do_update_jiffies64
1 0.0046 tick_program_event
1 0.0046 timer_stats_update_stats
1 0.0046 uhci_check_ports
1 0.0046 uhci_scan_schedule
1 0.0046 unfreeze_slab
1 0.0046 unlock_page
1 0.0046 unmap_vmas
1 0.0046 vma_prio_tree_add
1 0.0046 zone_watermark_ok
I searched the lkml and it would seem that this problem is the same as:
http://marc.info/?l=linux-kernel&m=119630440310744&w=2 reported late last year.
I have no hardware in common with that case, but have a similar setup of
bridged ethernet/tap devices.
Powertop shows:
Top causes for wakeups:
64.9% (251.0) ip : br_stp_enable_bridge
(br_hello_timer_expired)
12.6% ( 48.8) USB device 6-2.3 : USB-PS/2 Optical Mouse (Logitech)
10.3% ( 39.7) <interrupt> : ehci_hcd:usb2, uhci_hcd:usb6
6.4% ( 24.6) aterm : schedule_timeout (process_timeout)
2.8% ( 10.9) X : do_setitimer (it_real_fn)
1.0% ( 4.0) <kernel core> : usb_hcd_poll_rh_status (rh_timer_func)
0.5% ( 2.0) <kernel core> : clocksource_register (clocksource_watchdog)
0.3% ( 1.2) fluxbox : schedule_timeout (process_timeout)
0.3% ( 1.0) ip : br_transmit_config (br_hold_timer_expired)
0.3% ( 1.0) ip : br_stp_enable_bridge (br_fdb_cleanup)
0.2% ( 0.8) <interrupt> : ata_piix, ata_piix, uhci_hcd:usb7
0.2% ( 0.6) firefox : futex_wait (hrtimer_wakeup)
0.1% ( 0.5) ip : __netdev_watchdog_up (dev_watchdog)
0.1% ( 0.2) init : schedule_timeout (process_timeout)
0.1% ( 0.2) <kernel core> : page_writeback_init (wb_timer_fn)
0.0% ( 0.1) <kernel core> : neigh_table_init_no_netlink
(neigh_periodic_timer)
0.0% ( 0.1) <kernel core> : enqueue_task_rt (sched_rt_period_timer
--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Bugme-new
mailing list