[Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops

Paul E. McKenney paulmck at linux.vnet.ibm.com
Tue Jul 26 22:40:35 UTC 2016


On Tue, Jul 26, 2016 at 05:23:36PM +0200, Hannes Reinecke wrote:
> On 07/26/2016 03:10 PM, Alan Stern wrote:
> >On Tue, 26 Jul 2016, Hannes Reinecke wrote:
> >
> >>I have been playing around with RCUs and memory barriers quite a lot
> >>recently, and found some really 'odd' use-cases in the kernel which
> >>would benefit from improvements here.
> >
> >Could you post one or two examples?  It would be interesting to see
> >what they involve.
> >
> I have been working on a performance regression when calling
> 'dm_suspend/dm_resume' repeatedly for several (hundreds) devices.
> That boiled down to the patch introducing srcu in the device mapper core
> with commit 83d5e5b0af907 (dm: optimize use SRCU and RCU).
> Looking at it the code they do things like:
> 
> 	set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
> 	if (map)
> 		synchronize_srcu(&md->io_barrier);
> 
> where the srcu is used to ensure the code has left the critical
> sections. However, if the memory pointed to by the srcu isn't
> actually freed why we could easily drop the 'synchronize_srcu' call.
> But that would require that
> a) the set_bit() above is indeed atomic
> and
> b) there's no need to call 'synchronize_rcu' if you're not actually
> freeing memory but rather fiddle pointers.
> Both are somewhat shady areas where the documentation nor usage
> reveals some obvious insights.

The set_bit() function is guaranteed to execute atomically, but it does
not guarantee any ordering against prior accesses.  The synchronize_rcu()
does provide ordering against the set_bit(), to subsequent accesses are
covered, but only in the case where "map" is non-zero.

For RCU, it does depend on the use case.  For example, there are some
rare but real cases where synchronize_rcu() is required even if you are
not freeing memory:

https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf

> On another example I've been doing performance patches to the lpfc
> driver (cf my talk at VAULT this year), where I've replaced most
> spinlocks with atomics and bitops.
> Which should work as well, only that it's still a bit unclear to me
> if an when you need barriers in addition to atomic resp bitops.

If the bitop returns a value, you don't need additional barriers.
Otherwise ...
 ... you need smp_mb__before_atomic() to order prior accesses
against the bitop and smp_mb__after_atomic() to order subsequent
accesses against the bitop.  If you need the bitop to be ordered
against both prior and subsequent accesses, then you need both
smp_mb__before_atomic() and smp_mb__after_atomic().

> And if you need barriers, which variant would be most appropriate?
> The __before or the __after variant?

 ... you need smp_mb__before_atomic() to order prior accesses
against the bitop and smp_mb__after_atomic() to order subsequent
accesses against the bitop.  If you need the bitop to be ordered
against both prior and subsequent accesses, then you need both
smp_mb__before_atomic() and smp_mb__after_atomic().

> Also, what happens to bitops on bitfields longer than an unsigned long?
> Are they still atomic?

>From what I can see, yes, sort of.  The "sort of" part is due to the fact
that bitops on widely separated bits would be would avoid interfering
with each other, but on the other hand, there would be no cause-and-effect
relationship between them, either.  Furthermore, processes reading the
bits set might disagree on the order in which they were set.

All that aside, please note that the initial memory model is limited
to memory reference, barriers, and RCU.  We do not yet have locking or
read-modify-write atomic operations.  We have to start somewhere!

							Thanx, Paul



More information about the Ksummit-discuss mailing list