[Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination

Johannes Weiner hannes at cmpxchg.org
Mon Aug 1 18:19:24 UTC 2016


On Mon, Aug 01, 2016 at 01:08:46PM -0400, Johannes Weiner wrote:
> On Mon, Aug 01, 2016 at 09:11:32AM -0700, Dave Hansen wrote:
> > On 08/01/2016 09:06 AM, James Bottomley wrote:
> > >>  With persistent memory devices you might actually run out of CPU 
> > >> > capacity while performing basic page aging before you saturate the 
> > >> > storage device (which is why Andi Kleen has been suggesting to 
> > >> > replace LRU reclaim with random replacement for these devices). So 
> > >> > storage device saturation might not be the final answer to this
> > >> > problem.
> > > We really wouldn't want this.  All cloud jobs seem to have memory they
> > > allocate but rarely use, so we want the properties of the LRU list to
> > > get this on swap so we can re-use the memory pages for something else. 
> > >  A random replacement algorithm would play havoc with that.
> > 
> > I don't want to put words in Andi's mouth, but what we want isn't
> > necessarily something that is random, but it's something that uses less
> > CPU to swap out a given page.
> 
> Random eviction doesn't mean random outcome of what stabilizes in
> memory and swap. The idea is to apply pressure on all pages equally
> but in no particular order, and then the in-memory set forms based on
> reference frequencies and refaults/swapins.

Anyway, this is getting a little off-topic.

I only brought up CPU cost to make the point that, while sustained
swap-in rate might be a good signal to unload a machine or reschedule
a job elsewhere, it might not be a generic answer to the question of
how much a system's overall progress is actually impeded due to
somebody swapping; or whether the system is actually in a livelock
state that requires intervention by the OOM killer.


More information about the Ksummit-discuss mailing list