[Bugme-new] [Bug 11433] New: CONFIG_NO_HZ causes ksoftirqd to load one cpu by 8% even when idle.

bugme-daemon at bugzilla.kernel.org bugme-daemon at bugzilla.kernel.org
Wed Aug 27 06:42:17 PDT 2008


http://bugzilla.kernel.org/show_bug.cgi?id=11433

           Summary: CONFIG_NO_HZ causes ksoftirqd to load one cpu by 8% even
                    when idle.
           Product: Process Management
           Version: 2.5
     KernelVersion: 2.6.26.3
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: process_other at kernel-bugs.osdl.org
        ReportedBy: sc.contact at gmail.com


Latest working kernel version: unknown
Earliest failing kernel version:  unknown
Distribution: Gentoo
Hardware Environment: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
Software Environment: GCC 4.3.1  GLIBC 2.8

Problem Description:  

ksoftirqd loads one cpu at 8% even when the machine is idle only when dynticks
is enabled.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
   10 root      15  -5     0    0    0 S    8  0.0   1:44.89 ksoftirqd/3 

 14:25:53 up 22 min,  2 users,  load average: 0.04, 0.06, 0.01

Although this is a grsecurity patched kernel, I have verified this behaviour
with a vanilla kernel.

Witout dynticks:
quad zakalwe # opreport -l /usr/src/linux-2.6.26.3-grsec/vmlinux
CPU: Core 2, speed 1603 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 1201500
samples  %        symbol name
162      20.3774  mwait_idle
76        9.5597  apic_timer_interrupt
74        9.3082  native_read_tsc
34        4.2767  slab_pad_check
28        3.5220  getnstimeofday
21        2.6415  check_bytes_and_report
17        2.1384  native_sched_clock
16        2.0126  __update_sched_clock
16        2.0126  run_timer_softirq
14        1.7610  __do_softirq
14        1.7610  tick_sched_timer
11        1.3836  find_busiest_group
10        1.2579  scheduler_tick
9         1.1321  hrtimer_interrupt
9         1.1321  ktime_get_ts
9         1.1321  page_fault
9         1.1321  smp_apic_timer_interrupt
8         1.0063  sched_clock_tick
7         0.8805  _local_bh_enable
7         0.8805  do_softirq
7         0.8805  raise_softirq
7         0.8805  read_tsc
6         0.7547  hrtimer_run_pending
6         0.7547  sched_clock_cpu
6         0.7547  update_wall_time
5         0.6289  hrtimer_run_queues
5         0.6289  init_object
5         0.6289  notifier_call_chain
5         0.6289  on_freelist
5         0.6289  profile_tick
4         0.5031  __run_hrtimer
4         0.5031  _spin_lock
4         0.5031  clear_page_c
4         0.5031  find_lock_page
4         0.5031  hrtimer_forward
4         0.5031  in_lock_functions
4         0.5031  ktime_get
4         0.5031  lapic_next_event
4         0.5031  list_del
4         0.5031  restore_args
4         0.5031  update_process_times
3         0.3774  __round_jiffies
3         0.3774  account_system_time_scaled
3         0.3774  check_slab
3         0.3774  clockevents_program_event
3         0.3774  irq_enter
3         0.3774  irq_exit
3         0.3774  mce_idle_callback
3         0.3774  rb_next
3         0.3774  rcu_pending
3         0.3774  run_rebalance_domains
3         0.3774  schedule
3         0.3774  unmap_vmas
3         0.3774  update_vsyscall
2         0.2516  __first_cpu
2         0.2516  __link_path_walk
2         0.2516  _atomic_dec_and_lock
2         0.2516  copy_page_range
2         0.2516  copy_user_generic_string
2         0.2516  cpu_idle
2         0.2516  do_page_fault
2         0.2516  enter_idle
2         0.2516  filemap_fault
2         0.2516  find_first_bit
2         0.2516  find_next_bit
2         0.2516  get_page_from_freelist
2         0.2516  mutex_lock
2         0.2516  page_remove_rmap
2         0.2516  rb_insert_color
2         0.2516  thread_return
1         0.1258  __chk_obj_label
1         0.1258  __dec_zone_page_state
1         0.1258  __delay
1         0.1258  __dequeue_entity
1         0.1258  __dequeue_signal
1         0.1258  __find_get_block
1         0.1258  __list_add
1         0.1258  __pagevec_free
1         0.1258  __remove_hrtimer
1         0.1258  __slab_free
1         0.1258  __switch_to
1         0.1258  __up_read
1         0.1258  __wake_up_bit
1         0.1258  account_process_tick
1         0.1258  account_system_time
1         0.1258  add_event_entry
1         0.1258  any_ports_active
1         0.1258  check_object
1         0.1258  con_chars_in_buffer
1         0.1258  copy_page_c
1         0.1258  cp_new_stat
1         0.1258  deactivate_task
1         0.1258  dequeue_entity
1         0.1258  dequeue_task_fair
1         0.1258  do_dbs_timer
1         0.1258  do_mmap_pgoff
1         0.1258  do_select
1         0.1258  down_write
1         0.1258  elv_rb_find
1         0.1258  enqueue_hrtimer
1         0.1258  exit_idle
1         0.1258  ext3_get_group_desc
1         0.1258  find_vma
1         0.1258  find_vma_prev
1         0.1258  finish_task_switch
1         0.1258  flush_tlb_page
1         0.1258  fput
1         0.1258  free_hot_cold_page
1         0.1258  free_pages_and_swap_cache
1         0.1258  generic_file_aio_read
1         0.1258  handle_mm_fault
1         0.1258  ioread8
1         0.1258  kfree
1         0.1258  locks_remove_posix
1         0.1258  lookup_mnt
1         0.1258  memory_open
1         0.1258  mousedev_event
1         0.1258  mousedev_notify_readers
1         0.1258  next_zones_zonelist
1         0.1258  page_add_file_rmap
1         0.1258  page_waitqueue
1         0.1258  prepend
1         0.1258  radix_tree_lookup
1         0.1258  rb_erase
1         0.1258  read_inode_bitmap
1         0.1258  release_pages
1         0.1258  ret_from_intr
1         0.1258  run_posix_cpu_timers
1         0.1258  run_workqueue
1         0.1258  set_normalized_timespec
1         0.1258  skb_release_data
1         0.1258  sock_alloc_send_skb
1         0.1258  task_rq_lock
1         0.1258  task_tick_idle
1         0.1258  timer_stats_update_stats
1         0.1258  uhci_check_ports
1         0.1258  uhci_hub_status_data
1         0.1258  uhci_scan_schedule
1         0.1258  update_curr
1         0.1258  usecs_to_jiffies
1         0.1258  vfs_getattr
1         0.1258  vma_adjust




With Dynticks

quad zakalwe # opreport -l /usr/src/linux-2.6.26.3-grsec/vmlinux
CPU: Core 2, speed 1603 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 1201500
samples  %        symbol name
3668     16.9017  __switch_to
2028      9.3448  schedule
1996      9.1973  native_read_tsc
1563      7.2021  find_busiest_group
767       3.5342  finish_task_switch
723       3.3315  native_sched_clock
709       3.2670  thread_return
662       3.0504  sched_clock_cpu
472       2.1749  __update_sched_clock
447       2.0597  tick_nohz_stop_sched_tick
409       1.8846  hrtick_set
401       1.8478  hrtick_start_fair
386       1.7786  read_tsc
332       1.5298  ksoftirqd
318       1.4653  __do_softirq
295       1.3593  task_rq_lock
265       1.2211  __resched_task
253       1.1658  enqueue_entity
250       1.1520  getnstimeofday
238       1.0967  do_softirq
231       1.0644  find_first_bit
222       1.0229  pick_next_task_fair
220       1.0137  find_next_bit
220       1.0137  get_next_timer_interrupt
217       0.9999  tick_nohz_restart_sched_tick
212       0.9769  update_curr
211       0.9723  select_task_rq_fair
210       0.9677  set_next_entity
178       0.8202  __next_cpu
169       0.7787  try_to_wake_up
166       0.7649  __first_cpu
161       0.7419  rb_insert_color
144       0.6635  dequeue_entity
144       0.6635  rb_erase
136       0.6267  __enqueue_entity
130       0.5990  put_prev_task_fair
115       0.5299  hrtimer_run_pending
110       0.5069  cpu_idle
99        0.4562  enqueue_task_fair
99        0.4562  run_timer_softirq
97        0.4470  deactivate_task
96        0.4424  ktime_get
96        0.4424  rb_next
92        0.4239  _spin_unlock_irqrestore
90        0.4147  __dequeue_entity
87        0.4009  tick_nohz_stop_idle
86        0.3963  call_softirq
84        0.3871  _local_bh_enable
83        0.3825  __list_add
82        0.3778  kthread_should_stop
80        0.3686  place_entity
77        0.3548  activate_task
74        0.3410  enqueue_task
68        0.3133  dequeue_task_fair
68        0.3133  rcu_pending
67        0.3087  ktime_get_ts
65        0.2995  set_normalized_timespec
55        0.2534  msecs_to_jiffies
51        0.2350  _cond_resched
49        0.2258  pick_next_task_rt
46        0.2120  raise_softirq_irqoff
44        0.2027  rcu_needs_cpu
40        0.1843  slab_pad_check
39        0.1797  check_preempt_curr_idle
37        0.1705  wake_up_process
34        0.1567  pick_next_task_idle
33        0.1521  put_prev_task_idle
32        0.1475  list_add
22        0.1014  check_bytes_and_report
22        0.1014  mwait_idle
18        0.0829  apic_timer_interrupt
9         0.0415  copy_user_generic_string
8         0.0369  init_object
7         0.0323  on_freelist
6         0.0276  page_fault
5         0.0230  __slab_alloc
5         0.0230  add_event_entry
5         0.0230  smp_apic_timer_interrupt
5         0.0230  sync_buffer
5         0.0230  update_wall_time
4         0.0184  __slab_free
4         0.0184  kmem_cache_free
4         0.0184  sched_clock_tick
4         0.0184  scheduler_tick
3         0.0138  __run_hrtimer
3         0.0138  account_system_time
3         0.0138  clear_page_c
3         0.0138  flush_tlb_page
3         0.0138  irq_enter
3         0.0138  raise_softirq
3         0.0138  select_nohz_load_balancer
3         0.0138  unix_poll
2         0.0092  __d_lookup
2         0.0092  __find_get_block
2         0.0092  __journal_temp_unlink_buffer
2         0.0092  __wake_up
2         0.0092  _spin_lock
2         0.0092  br_hello_timer_expired
2         0.0092  check_object
2         0.0092  check_slab
2         0.0092  clocksource_get_next
2         0.0092  copy_page_c
2         0.0092  do_page_fault
2         0.0092  find_lock_page
2         0.0092  idle_cpu
2         0.0092  memcmp
2         0.0092  mmap_region
2         0.0092  path_put
2         0.0092  release_pages
2         0.0092  run_rebalance_domains
2         0.0092  system_call_after_swapgs
2         0.0092  tick_nohz_update_jiffies
2         0.0092  tick_sched_timer
2         0.0092  update_process_times
2         0.0092  update_vsyscall
2         0.0092  vsnprintf
1         0.0046  __alloc_pages_internal
1         0.0046  __blk_put_request
1         0.0046  __do_fault
1         0.0046  __down_write_nested
1         0.0046  __find_get_block_slow
1         0.0046  __fput
1         0.0046  __inet_lookup_established
1         0.0046  __mod_timer
1         0.0046  __pte_alloc
1         0.0046  __round_jiffies
1         0.0046  __timer_stats_hrtimer_set_start_info
1         0.0046  __up_write
1         0.0046  __wake_up_sync
1         0.0046  add_dirent_to_buf
1         0.0046  any_ports_active
1         0.0046  br_config_bpdu_generation
1         0.0046  br_transmit_config
1         0.0046  copy_page_range
1         0.0046  do_dbs_timer
1         0.0046  do_get_write_access
1         0.0046  do_notify_resume
1         0.0046  do_wait
1         0.0046  do_wp_page
1         0.0046  dput
1         0.0046  enqueue_hrtimer
1         0.0046  exit_itimers
1         0.0046  ext3_get_blocks_handle
1         0.0046  ext3_journal_start_sb
1         0.0046  ext3_mark_iloc_dirty
1         0.0046  filemap_fault
1         0.0046  find_vma
1         0.0046  free_pgd_range
1         0.0046  get_pid_task
1         0.0046  get_slab
1         0.0046  get_user_pages
1         0.0046  handle_mm_fault
1         0.0046  hrtick_resched
1         0.0046  hrtimer_get_next_event
1         0.0046  init_timer
1         0.0046  inotify_inode_queue_event
1         0.0046  ioread8
1         0.0046  iowrite8
1         0.0046  journal_put_journal_head
1         0.0046  journal_start
1         0.0046  lock_timer_base
1         0.0046  mutex_lock
1         0.0046  notifier_call_chain
1         0.0046  page_waitqueue
1         0.0046  path_get
1         0.0046  proc_sys_lookup_table_one
1         0.0046  radix_tree_insert
1         0.0046  rebalance_domains
1         0.0046  remove_vma
1         0.0046  ret_from_intr
1         0.0046  run_posix_cpu_timers
1         0.0046  sched_balance_self
1         0.0046  scsi_sg_free
1         0.0046  strncpy_from_user
1         0.0046  task_tick_idle
1         0.0046  tick_do_update_jiffies64
1         0.0046  tick_program_event
1         0.0046  timer_stats_update_stats
1         0.0046  uhci_check_ports
1         0.0046  uhci_scan_schedule
1         0.0046  unfreeze_slab
1         0.0046  unlock_page
1         0.0046  unmap_vmas
1         0.0046  vma_prio_tree_add
1         0.0046  zone_watermark_ok

I searched the lkml and it would seem that this problem is the same as:
http://marc.info/?l=linux-kernel&m=119630440310744&w=2 reported late last year.
I have no hardware in common with that case, but have a similar setup of
bridged ethernet/tap devices.

Powertop shows:

Top causes for wakeups:
 64.9% (251.0)                ip : br_stp_enable_bridge
(br_hello_timer_expired) 
  12.6% ( 48.8)   USB device 6-2.3 : USB-PS/2 Optical Mouse (Logitech) 
  10.3% ( 39.7)       <interrupt> : ehci_hcd:usb2, uhci_hcd:usb6 
   6.4% ( 24.6)             aterm : schedule_timeout (process_timeout) 
   2.8% ( 10.9)                 X : do_setitimer (it_real_fn) 
   1.0% (  4.0)     <kernel core> : usb_hcd_poll_rh_status (rh_timer_func)
   0.5% (  2.0)     <kernel core> : clocksource_register (clocksource_watchdog)
   0.3% (  1.2)           fluxbox : schedule_timeout (process_timeout)
   0.3% (  1.0)                ip : br_transmit_config (br_hold_timer_expired)
   0.3% (  1.0)                ip : br_stp_enable_bridge (br_fdb_cleanup)
   0.2% (  0.8)       <interrupt> : ata_piix, ata_piix, uhci_hcd:usb7
   0.2% (  0.6)           firefox : futex_wait (hrtimer_wakeup)
   0.1% (  0.5)                ip : __netdev_watchdog_up (dev_watchdog)
   0.1% (  0.2)              init : schedule_timeout (process_timeout)
   0.1% (  0.2)     <kernel core> : page_writeback_init (wb_timer_fn)
   0.0% (  0.1)     <kernel core> : neigh_table_init_no_netlink
(neigh_periodic_timer)
   0.0% (  0.1)     <kernel core> : enqueue_task_rt (sched_rt_period_timer


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


More information about the Bugme-new mailing list