[PATCH 0/4] x86: Add Cache QoS Monitoring (CQM) support

Tue Feb 18 19:35:28 UTC 2014

On Tue, Feb 18, 2014 at 05:29:42PM +0000, Waskiewicz Jr, Peter P wrote:
> > Its not a problem that changing the task:RMID map is expensive, what is
> > a problem is that there's no deterministic fashion of doing it.
> 
> We are going to add to the SDM that changing RMID's often/frequently is
> not the intended use case for this feature, and can cause bogus data.
> The real intent is to land threads into an RMID, and run that until the
> threads are effectively done.
> 
> That being said, reassigning a thread to a new RMID is certainly
> supported, just "frequent" updates is not encouraged at all.

You don't even need really high frequency, just unsynchronized wrt
reading the counter. Suppose A flips the RMIDs about and just when its
done programming B reads them.

At that point you've got 0 guarantee the data makes any kind of sense.

> I do see that, however the userspace interface for this isn't ideal for
> how the feature is intended to be used.  I'm still planning to have this
> be managed per process in /proc/<pid>, I just had other priorities push
> this back a bit on my stovetop.

So I really don't like anything /proc/$pid/ nor do I really see a point in
doing that. What are you going to do in the /proc/$pid/ thing anyway?
Exposing raw RMIDs is an absolute no-no, and anything else is going to
end up being yet-another-grouping thing and thus not much different from
cgroups.

> Also, now that the new SDM is available

Can you guys please set up a mailing list already so we know when
there's new versions out? Ideally mailing out the actual PDF too so I
get the automagic download and archive for all versions.

> , there is a new feature added to
> the same family as CQM, called Memory Bandwidth Monitoring (MBM).  The
> original cgroup approach would have allowed another subsystem be added
> next to cacheqos; the perf-cgroup here is not easily expandable.
> The /proc/<pid> approach can add MBM pretty easily alongside CQM.

I'll have to go read up what you've done now, but if its also RMID based
I don't see why the proposed scheme won't work.

> > The below is a rough draft, most if not all XXXs should be
> > fixed/finished. But given I don't actually have hardware that supports
> > this stuff (afaik) I couldn't be arsed.
> 
> The hardware is not publicly available yet, but I know that Red Hat and
> others have some of these platforms for testing.

Yeah, not in my house therefore it doesn't exist :-)

> I really appreciate the patch.  There was a good amount of thought put
> into this, and gave a good set of different viewpoints.  I'll keep the
> comments all here in one place, it'll be easier to discuss than
> disjointed in the code.
> 
> The rotation idea to reclaim RMID's no longer in use is interesting.
> This differs from the original patch where the original patch would
> reclaim the RMID when monitoring was disabled for that group of
> processes.
> 
> I can see a merged sort of approach, where if monitoring for a group of
> processes is disabled, we can place that RMID onto a reclaim list.  The
> next time an RMID is requested (monitoring is enabled for a
> process/group of processes), the reclaim list is searched for an RMID
> that has 0 occupancy (i.e. not in use), or worst-case, find and assign
> one with the lowest occupancy.  I did discuss this with hpa offline and
> this seemed reasonable.
> 
> Thoughts?

So you have to wait for one 'freed' RMID to become empty before
'allowing' reads of the other RMIDs, otherwise the visible value can be
complete rubbish. Even for low frequency rotation, see the above
scenario about asynchronous operations.

This means you have to always have at least one free RMID.