[Ksummit-2013-discuss] [ATTEND] Linux VM Infrastructure to support Memory Power Management

Sat Jul 27 17:58:39 UTC 2013

This is going to interact with I/O latency too. When we can swap a page in
under 10us, we don't want to.spend 50us finding the "best" page to swap.
On 2013-07-27 1:07 PM, "Johannes Weiner" <hannes at cmpxchg.org> wrote:

> On Thu, Jul 18, 2013 at 05:29:38PM +0530, Srivatsa S. Bhat wrote:
> > Hi,
> >
> > I have been working on developing Linux VM designs and algorithms to
> support
> > Memory Power Management. As computer systems are increasingly sporting
> larger
> > and larger amounts of RAM, the power consumption of the memory hardware
> > subsystem is showing up as a very significant portion of the total power
> > consumption of the system (sometimes even higher than that of CPUs,
> depending
> > on the machine configuration)[2]. So memory has become an important
> target
> > for power-management - on embedded systems/smartphones, and all the way
> upto
> > large server systems.
> >
> > Modern memory hardware such as DDR3 support a number of power management
> > capabilities. And new firmware standards such as ACPI 5.0 have added
> support
> > to export the power-management features of the underlying memory hardware
> > to the Operating System in a standard way[3]. And on ARM platforms this
> info
> > can be exported to the OS via the bootloader or the device-tree. So
> ultimately,
> > it is upto the kernel's MM subsystem to make the best use of these
> capabilities
> > and manage memory power-efficiently. It had been demonstrated on a
> Samsung
> > Exynos board (with 2 GB RAM) that upto 6% of total system power can be
> saved
> > by making the Linux kernel MM subsystem power-aware[4]. (More savings
> can be
> > expected on systems with larger amounts of memory, and perhaps improved
> > further using better MM designs).
> >
> > Often this simply translates to having the Linux MM understand the
> granularity
> > at which RAM modules can be power-managed, and consolidating the memory
> > allocations and references to a minimum number of these power-manageable
> > "memory regions". The memory hardware has the intelligence to
> automatically
> > transition memory banks that haven't been referenced for a threshold
> amount
> > of time, to low-power content-preserving states. And they can also
> perform
> > OS-cooperative power-down of unused (unallocated) memory regions. So the
> onus
> > is on the Linux VM to become power-aware and shape the allocations and
> > influence the references in such a way that it helps conserve memory
> power.
> > This involves consolidating the allocations/references at the right
> address
> > boundaries, keeping the memory-region granularity in mind.
> >
> > With that goal, I had revived Ankita Garg's "Hierarchy" design of the
> > Linux VM[5] and later I had come up with a completely new design called
> the
> > "Sorted-buddy" design[6]. Recently, I also added a mechanism to perform
> targeted
> > memory compaction, in order to support light-weight memory region
> evacuation,
> > to further enhance the opportunities for memory power savings[7]. While
> these
> > patchsets did generate a fair amount of discussion around the issues
> involved,
> > the implications of these core MM changes can be quite far-reaching and
> hence
> > I believe that having a wider discussion in the Kernel Summit would be
> > invaluable and would help convey the idea behind this work and get
> insights
> > and suggestions from core developers and maintainers.
>
> From a page reclaim perspective, memory regions have to be either
> completely unused or receive the same amount of page allocations and
> reclaim pressure as other in-use regions.  All allocated pages need to
> receive the same amount of time in memory to gather references so that
> we can reliably detect and protect frequently used pages from rarely
> used ones.  Allocating in ascending region order while reclaiming in
> the opposite direction means that pages in region 0 have won the
> lottery while memory in region 4 out of 4 is thrashing, and we can no
> longer tell which pages are important and which aren't.  The costs of
> broken page reclaim in terms of CPU power and IO will probably
> outweigh any memory power savings.
>
> This binary on-or-off requirement makes all this much more similar to
> how memory hotplug onlines and offlines whole sections of memory at a
> time and we could probably reuse a lot of infrastructure.  That is, we
> might be able to stick with nodes and zones and just add remove ranges
> of page frames at a time.  The allocator and reclaim would not
> necessarily have to be power aware at all.  Not beyond the existing
> page mobility grouping anyway, which would be fantastic.
>
> Another question is how to decide if the available memory is too small
> or too big.  We can detect thrashing if it's too small, but we would
> need to know if increasing the available memory and thus the memory
> power consumption would be less than the CPU/IO power cost.  Too much
> memory is harder to tell because we have no feedback from page reclaim
> anymore and this question has been the source of a lot of discussion
> in the context of virtual machine and container sizing.
>
> I've spent the last year working on allocator fairness, improving page
> reclaim behavior over multiple nodes/zones, and detecting accurately
> when workloads exceed their available memory.  Power-aware regions
> pose a similar challenge, so I would be interested in discussing this
> topic.
>
> Thanks,
> Johannes
> _______________________________________________
> Ksummit-2013-discuss mailing list
> Ksummit-2013-discuss at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-2013-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/ksummit-2013-discuss/attachments/20130727/fc9182b6/attachment-0001.html>