[Ksummit-2013-discuss] [ATTEND] Linux VM Infrastructure to support Memory Power Management

Johannes Weiner hannes at cmpxchg.org
Sat Jul 27 17:06:51 UTC 2013


On Thu, Jul 18, 2013 at 05:29:38PM +0530, Srivatsa S. Bhat wrote:
> Hi,
> 
> I have been working on developing Linux VM designs and algorithms to support
> Memory Power Management. As computer systems are increasingly sporting larger
> and larger amounts of RAM, the power consumption of the memory hardware
> subsystem is showing up as a very significant portion of the total power
> consumption of the system (sometimes even higher than that of CPUs, depending
> on the machine configuration)[2]. So memory has become an important target
> for power-management - on embedded systems/smartphones, and all the way upto
> large server systems.
> 
> Modern memory hardware such as DDR3 support a number of power management
> capabilities. And new firmware standards such as ACPI 5.0 have added support
> to export the power-management features of the underlying memory hardware
> to the Operating System in a standard way[3]. And on ARM platforms this info
> can be exported to the OS via the bootloader or the device-tree. So ultimately,
> it is upto the kernel's MM subsystem to make the best use of these capabilities
> and manage memory power-efficiently. It had been demonstrated on a Samsung
> Exynos board (with 2 GB RAM) that upto 6% of total system power can be saved
> by making the Linux kernel MM subsystem power-aware[4]. (More savings can be
> expected on systems with larger amounts of memory, and perhaps improved
> further using better MM designs).
> 
> Often this simply translates to having the Linux MM understand the granularity
> at which RAM modules can be power-managed, and consolidating the memory
> allocations and references to a minimum number of these power-manageable
> "memory regions". The memory hardware has the intelligence to automatically
> transition memory banks that haven't been referenced for a threshold amount
> of time, to low-power content-preserving states. And they can also perform
> OS-cooperative power-down of unused (unallocated) memory regions. So the onus
> is on the Linux VM to become power-aware and shape the allocations and
> influence the references in such a way that it helps conserve memory power.
> This involves consolidating the allocations/references at the right address
> boundaries, keeping the memory-region granularity in mind.
> 
> With that goal, I had revived Ankita Garg's "Hierarchy" design of the
> Linux VM[5] and later I had come up with a completely new design called the
> "Sorted-buddy" design[6]. Recently, I also added a mechanism to perform targeted
> memory compaction, in order to support light-weight memory region evacuation,
> to further enhance the opportunities for memory power savings[7]. While these
> patchsets did generate a fair amount of discussion around the issues involved,
> the implications of these core MM changes can be quite far-reaching and hence
> I believe that having a wider discussion in the Kernel Summit would be
> invaluable and would help convey the idea behind this work and get insights
> and suggestions from core developers and maintainers.

>From a page reclaim perspective, memory regions have to be either
completely unused or receive the same amount of page allocations and
reclaim pressure as other in-use regions.  All allocated pages need to
receive the same amount of time in memory to gather references so that
we can reliably detect and protect frequently used pages from rarely
used ones.  Allocating in ascending region order while reclaiming in
the opposite direction means that pages in region 0 have won the
lottery while memory in region 4 out of 4 is thrashing, and we can no
longer tell which pages are important and which aren't.  The costs of
broken page reclaim in terms of CPU power and IO will probably
outweigh any memory power savings.

This binary on-or-off requirement makes all this much more similar to
how memory hotplug onlines and offlines whole sections of memory at a
time and we could probably reuse a lot of infrastructure.  That is, we
might be able to stick with nodes and zones and just add remove ranges
of page frames at a time.  The allocator and reclaim would not
necessarily have to be power aware at all.  Not beyond the existing
page mobility grouping anyway, which would be fantastic.

Another question is how to decide if the available memory is too small
or too big.  We can detect thrashing if it's too small, but we would
need to know if increasing the available memory and thus the memory
power consumption would be less than the CPU/IO power cost.  Too much
memory is harder to tell because we have no feedback from page reclaim
anymore and this question has been the source of a lot of discussion
in the context of virtual machine and container sizing.

I've spent the last year working on allocator fairness, improving page
reclaim behavior over multiple nodes/zones, and detecting accurately
when workloads exceed their available memory.  Power-aware regions
pose a similar challenge, so I would be interested in discussing this
topic.

Thanks,
Johannes


More information about the Ksummit-2013-discuss mailing list