[PATCH 09/15] device core: Add ability to handle multiple dma offsets

Jim Quinlan james.quinlan at broadcom.com
Fri May 22 14:31:00 UTC 2020

Hi Nicolas,

On Wed, May 20, 2020 at 7:28 AM Nicolas Saenz Julienne
<nsaenzjulienne at suse.de> wrote:
> Hi Jim,
> thanks for having a go at this! My two cents.
> On Tue, 2020-05-19 at 16:34 -0400, Jim Quinlan wrote:
> > The device variable 'dma_pfn_offset' is used to do a single
> > linear map between cpu addrs and dma addrs.  The variable
> > 'dma_map' is added to struct device to point to an array
> > of multiple offsets which is required for some devices.
> >
> > Signed-off-by: Jim Quinlan <james.quinlan at broadcom.com>
> > ---
> [...]
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -493,6 +493,8 @@ struct dev_links_info {
> >   * @bus_dma_limit: Limit of an upstream bridge or bus which imposes a smaller
> >   *           DMA limit than the device itself supports.
> >   * @dma_pfn_offset: offset of DMA memory range relatively of RAM
> > + * @dma_map: Like dma_pfn_offset but used when there are multiple
> > + *           pfn offsets for multiple dma-ranges.
> >   * @dma_parms:       A low level driver may set these to teach IOMMU code
> > about
> >   *           segment limitations.
> >   * @dma_pools:       Dma pools (if dma'ble device).
> > @@ -578,7 +580,12 @@ struct device {
> >                                            allocations such descriptors. */
> >       u64             bus_dma_limit;  /* upstream dma constraint */
> >       unsigned long   dma_pfn_offset;
> > -
> > +     const void *dma_offset_map;     /* Like dma_pfn_offset, but for
> > +                                      * the unlikely case of multiple
> > +                                      * offsets. If non-null, dma_pfn_offset
> > +                                      * will be 0. */
> I get a bad feeling about separating the DMA offset handling into two distinct
> variables. Albeit generally frowned upon, there is a fair amount of trickery
> around dev->dma_pfn_offset all over the kernel. usb_alloc_dev() comes to mind
> for example. And this obviously doesn't play well with it.

The trickery should only be present when
CONFIG_DMA_PFN_OFFSET_MAP=y**.  Otherwise it does no harm.  Also, I
feel that if dev-dma_pfn_offset is valid then so is
dev->dma_pfn_offset_map -- they both use the same mechanism in the
same places.  I am merely
extending something that has been in Linux for a long time..

Further,  I could have had dma_pfn_offset_map  subsume dma_pfn_offset
but I wanted to leave it alone since folks would complain that it
would go from an addition to an if-clause and an inline function.  But
if I did go that way there would  only be one mechanism that would
cover both cases.

> I feel a potential
> solution to multiple DMA ranges should completely integrate with the current
> device DMA handling code, without special cases, on top of that, be transparent
> to the user.

Having dma_pfn_offset_map subsume  dma_pfn_offset would integrate the
current  code too.  And I am not sure what you mean by being
"transparent to the user" -- the writer of the PCIe endpoint driver is
going to do some DMA calls and they have no idea if this mechanism is
in play or not.

> In more concrete terms, I'd repackage dev->bus_dma_limit and
> dev->dma_pfn_offset into a list/array of DMA range structures

This is sort of what I am doing except I defined my own structure.
Using the of_range structure would require one to do the same extra
calculations over  and over for a DMA call; this is why I  defined my
structure that has all of the needed precomputed variables.

> and adapt/create
> the relevant getter/setter functions so as for DMA users not to have to worry
> about the specifics of a device's DMA constraints.

I'm not sure I understand where these getter/setter functions would
exist or what they would do.

> editing dev->dma_pfn_offset, you'd be passing a DMA range structure to the
> device core, and let it take the relevant decisions on how to handle it

How and where would the device core operate for these getter/setters?
In how many places in the code?  The way I see it, any solution has to
adjust the value when doing dma2phys and phys2dma conversions, and the
most efficient place to do that is in the two DMA header files (the
second one is for ARM).

> internally (overwrite, add a new entry, merge them, etc...).
I'm concerned that  this would be overkill; I am just trying to get a
driver upstream for some baroque PCIe RC HW I'm not sure if we should
set up something elaborate when the demand is not there.

I'll be posting a v2.  ChrisophH has sent me some personal feedback
which I am incorporating; so feel free to discuss your ideas with him
as well because I really want consensus on any large changes in


** CONFIG_DMA_OF_PFN_OFFSET_MAP=y only occurs when building for
ARCH_BRCMSTB.  However, ARCH_BRCMSTB is set by the ARM64 defconfig and
the ARM multi_v7_defconfig, so it would be activated for those
defconfigs.  This may(a)  get us kicked out of those defconfigs  or
(b) we may have to keep DMA_OF_PFN_OFFSET_MAP off in those defconfigs
and only turn it on solely for Broadcom STB Linux.

> Easier said than done. :)
> Regards,
> Nicolas

More information about the iommu mailing list