[Ksummit-2013-discuss] [ATTEND] Linux VM Infrastructure to support Memory Power Management

Srinivas Pandruvada srinivas.pandruvada at linux.intel.com
Tue Jul 30 15:49:22 UTC 2013


On 07/29/2013 07:14 AM, Arjan van de Ven wrote:
> On 7/28/2013 8:32 PM, Johannes Weiner wrote:
>>> >But we have more than just a binary on-or-off switch, as I 
>>> mentioned above.
>>> >Also, we'll most likely have situations where we can just move pages
>>> >around to save power, way more frequently than opportunities to 
>>> evacuate
>>> >and power-off entire regions. So the former usecase is pretty 
>>> important,
>>> >IMHO.
>> I'm just wondering how far this gets us.
>
> there's also some dangers here in that many things are tradeoffs, not 
> easy to get right
> and quite often only valid for a few years (since the tradeoffs are 
> eventually hardware driven,
> and this sort of thing tends to change in the longer run)
>
> Few things to consider
> * Running a cpu core to move stuff around may be expensive or cheap, 
> depending on the cpu one picked
> * DIMMs are getting bigger (and with interleaving, the minimum unit of 
> saving may be a few dimms grouped together),
>   so the amount of work may be quite large
> * On just about all modern hardware, memory can go into self-refresh 
> when all CPUs are idle. SR has a lower
>   power level by quite a bit than active memory. This makes using the 
> CPU to do memory work double heavy
> * Most modern hardware can turn DIMMs into a lower power state (CKE 
> and the like) when not accessed.
>   CKE like states aren't as low power as SR, but still, it can add up. 
> So compaction/grouping to DIMMs
>   may be a benefit if it means we're not accessing some of the DIMMs 
> for a while

<Exactly. Based on experiments, it shows that modern memory controller 
are smart enough to keep the memory in low power state with CKE OFF up 
to 98%, with minimal power consumption. The wake up time is also 
extremely low. So this doesn't matter how much memory is used, but how 
often is accessed.
As Arjan suggested before, grouping of tasks to a memory region is 
beneficial. In my experiments, I divided buddy into buckets and used PID 
hash to store in a particular bucket for experiments.
  >
> * CPU caches can be huge, which can shield a lot of DIMM activity 
> (allowing them to go to lower power states)
>   or NUMA effects  .... but other systems have small caches.
> * As Matthew said... it also depends a lot on storage speed. With NVMe 
> and even faster storage,
>   the value of a page in the pagecache is clearly different on such 
> system than when using spinning rust
>   or a glorified USB stick.
> * There are huge differences in the various power levels between DDR3, 
> DDR3L, LPDDR and likely DDR4
>   whenever that shows up. These differences will likely mean different 
> tradeoffs.
>   E.g. super low power memory with super slow storage (eMMC) is 
> different clearly than higher power memory
>   with very fast storage in terms of what tradeoffs one should make in 
> the VM. Getting this to auto-tune
>   is important but interesting
>
>
> There is one other aspect we should think about: Memory power does not 
> normally depend on what bits are in memory[1].
> Freeing memory might be the wrong thing to do; since such freeing is 
> speculative on actually achieving memory
> savings some time later...
> ... what if we could have it so that the VM keeps the page, unmapped 
> of course, and only if memory power has totally
> gone away (e.g. the content invalidated) do we mark the content as 
> invalid... potentially only when we're asked to
> map the content again (using some sort of generation number or 
> whatever). This could avoid doing a lot of the more
> heavy stuff for when it doesn't pay off, and make it cheap to declare 
> a whole range suddenly no longer valid.
>
>
>
>
>
>
> [1] Except in virtual machines, where "all zeroes" content allows the 
> hypervisor to de-duplicate better and give
> more memory to other VMs as a result which then can run more efficient
>



More information about the Ksummit-2013-discuss mailing list