Constantly map and unmap of streaming DMA buffers with IOMMU backend might cause serious performance problem

Robin Murphy robin.murphy at
Fri May 15 22:12:19 UTC 2020

On 2020-05-15 22:33, Song Bao Hua wrote:
>> Subject: Re: Constantly map and unmap of streaming DMA buffers with
>> IOMMU backend might cause serious performance problem
>> On Fri, May 15, 2020 at 01:10:21PM +0100, Robin Murphy wrote:
>>>> Meanwhile, for the safety of buffers, lower-layer drivers need to make
>> certain the buffers have already been unmapped in iommu before those
>> buffers go back to buddy for other users.
>>> That sounds like it would only have benefit in a very small set of specific
>>> circumstances, and would be very difficult to generalise to buffers that
>>> are mapped via dma_map_page() or dma_map_single(). Furthermore, a
>>> high-level API that affects a low-level driver's interpretation of
>>> mid-layer API calls without the mid-layer's knowledge sounds like a hideous
>>> abomination of anti-design. If a mid-layer API lends itself to inefficiency
>>> at the lower level, it would seem a lot cleaner and more robust to extend
>>> *that* API for stateful buffer reuse. Failing that, it might possibly be
>>> appropriate to approach this at the driver level - many of the cleverer
>>> network drivers already implement buffer pools to recycle mapped SKBs
>>> internally, couldn't the "zip driver" simply try doing something like that
>>> for itself?
>> Exactly.  If you upper consumer of the DMA API keeps reusing the same
>> pages just map them once and use dma_sync_* to transfer ownership as
>> needed.
> The problem is that the lower-layer drivers don't know if upper consumer keeps reusing the same pages. They are running in different software layers.
> For example, Consumer is here in mm/zswap.c
> static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> 				struct page *page)
> {
> 	...
> 	/* compress */
> 	dst = get_cpu_var(zswap_dstmem);
> 	...
> 	ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);
> 	...
> }
> But the lower-layer driver is in drivers/crypto/...
> Meanwhile, the lower-layer driver couldn't cache the pointers of buffer address coming from consumers to detect if the upper-layer is using the same page.
> Because the same page might come from different users or come from the different stages of the same user with different permissions.

Indeed the driver can't cache arbitrary pointers, but if typical buffers 
are small enough it can copy the data into its own already-mapped page, 
dma_sync it, and perform the DMA operation from there. That might even 
be more or less what your first suggestion was, but I'm still not quite 

> For example, consumer A uses the buffer as destination, then returns it to buddy, but consumer B gets the same buffer and uses it as source.
> Another possibility is
> Consumer A uses the buffer, returns it to buddy, after some time, it allocates a buffer again, but gets the same buffer from buddy like before.
> For the safety of the buffer, lower-layer driver must guarantee the buffer is unmapped when the buffer returns to buddy.
> I think only the upper-layer consumer knows if it is reusing the buffer.

Right, and if reusing buffers is common in crypto callers, then there's 
an argument for "set up reusable buffer", "process updated buffer" and 
"clean up buffer" operations to be added to the crypto API itself, such 
that the underlying drivers can then optimise for DMA usage in a robust 
and obvious way if they want to (or just implement the setup and 
teardown as no-ops and still do a full map/unmap in each "process" call 
if they don't).


More information about the iommu mailing list