[Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends

Mon Aug 3 04:58:41 UTC 2015

On Wed, Jul 15, 2015 at 11:37:25AM -0400, Chris Mason wrote:
> Hi everyone,
> 
> I know I never get bored of graphs comparing old/new, but I feel guilty
> suggesting this one yet again.  Still, I think it's important for the
> people trying to push new kernels into production to have a chance to
> talk about the problems we've hit, and/or the changes that have made
> life easier.

I'm very interested in learning your experiences and problems, and
check whether they can be avoided in upstream kernel. So that
production systems like Facebook can upgrade kernels smoother in future.

> We're starting to push 4.0 into prod (122 hosts almost counts), and I'm
> sure we'll backport some wins from 4.2+.  I'm hoping to make this a
> collection point for other benchmarking war stories.  Our biggest gains
> right now are coming from scsi-mq, and early benchmarks show 4.2 has a
> boost that I'm hoping are from the futex locking improvements.

I can also share the performance trends in the data collected by 0day.
I'm afraid it'll be a bit negative because we cannot catchup with
writing new test cases to take advantage of the improvements in new
kernels.

Here is a comparison for a set of 988 test jobs.

                   v4.0    v4.1
-------------------------------
    perf-index      100      99  (the larger, the better)
   power-index      100      95
 latency-index      100      98
    size-index      100      98

The overall regressions also indicate 0day is not mature enough to
bisect all regressions in time and keep them from hitting mainline.

> It ties in a little with the new interfaces applications may be able to use
> (restartable sequences etc topic), and I want to ask the broad question of
> "are we doing enough to prevent performance regressions".

There are much to be desired in 0day POV.

- timeliness

The earlier regressions are caught, the better. Up to now kbuild is
doing reasonably well (mostly within 1 hour), however the runtime
tests -- boot, functional, performance/power/latency -- still have
obvious gaps (typically days long but sometimes may go up to weeks).

- coverage

Kbuild has achieved near 100% coverage (700 reports per month).
However runtime tests are far from enough (50 reports per month).

This is the area that needs collaborations throughout the community.
Developers in each subsystem -- mm, fs, network, rcu, sched, cgroup,
VM, drm, media, etc. -- may have versatile ways for testing his
subsystem or feature set:

- run some WORKLOAD to evaluate performance/power/latency/..

- SETUP the system in different ways to run tests
  eg. fs params, md/dm setup, cgroup, NUMA policy, CPU affinity, ..

- MONITOR various system metrics during the test run

If such knowledge and scripts can be shared and accumulated it'd be
valuable for other developers and testers, and will eventually help
overall linux kernel health.

Up to now 0day has collected a number of WORKLOAD, SETUP and MONITOR
scripts. They are public available here

https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/

There are much more to be desired. Contribution of new scripts will be
highly appreciated.

We are especially in short of SETUP scripts. Good test schemes should
cover different combinations of SETUP+WORKLOAD and their parameters.
There are presumably a huge number of ways one can configure his
system, however most are beyond our imagination and test scope.

For MONITOR/WORKLOAD scripts, we borrowed some few nice scripts from
Mel's MMTests. phoronix, xfstests, autotest, kernel selftests etc.
test suites are also running routinely in 0day infrastructure. So if
you add new test case to one of them, there are good chances it'll be
pick up by 0day.

Thanks,
Fengguang