[Ksummit-discuss] [CORE TOPIC] Kernel tinification: shrinking the kernel and avoiding size regressions

Fri May 9 16:55:23 UTC 2014

On Fri, 9 May 2014, Steven Rostedt wrote:

> > One improvement would be to sort the functions by functionality. All the
> > important functions in the first 2M of the code covered by one huge tlb
> > f.e.
>
> I thought pretty much all of kernel core memory is mapped in by huge
> tlbs? At least for kernel core code (not modules), the size should not
> impact tlbs.

Yes, but processor only support a limited amount of 2m tlbs and
applications also want to use them. A large 100M sized kernel would
require 50 tlbs and cause tlb trashing if functions are accessed over all
the code. Loadable modules are using vmalloc areas that use 4k pages which
is another issue.

> > Maybe we could reduce the number of cachelines used by critical functions
> > too? Arent there some tools that can automatize this in gcc?
>
> As I believe James has mentioned. This only helps if we keep the
> critical functions tight in a cacheline. I did some benchmarks moving
> the tracepoint code more out of line to help in cachelines, and I
> haven't seen anything above the noise. Which is the reason I haven't
> pushed that work further.
>
> Size may not be as important as having reuse of code. Perhaps if you
> can tweak several functions to call one helper function, which may
> actually increase the total size of the kernel, but having more helper
> functions that live in cache longer may be of benefit.

More helper functions means more use of l1 cache lines which reduces
performance.

> > In general the ability to reduce the size of the kernel to a minimum is a
> > desirable feature. I still see deployments of older kernels in the
> > financial industry because they have a higher performance and lower
> > latency. The only way to get those guys would be to keep the kernel size
> > and the size of the data touched the same.
>
> I actually wonder if that performance is really on "size" of the kernel
> and not just less features. Usually with features, we add more function
> calls and branches, which I believe may be the culprit of the slowdowns
> we are seeing.

That too... But James said they were using static branching.

Global optimization may allow the folding of small functions into a larger
one when advantageous (which is not simple to determine).