[Ce-android-mainline] [RFC][PATCH 1/1] gpu: ion: Add IOMMU heap allocator with IOMMU API
Hiroshi Doyu
hdoyu at nvidia.com
Thu Jan 19 19:47:27 UTC 2012
Hi Arnd,
From: Arnd Bergmann <arnd at arndb.de>
Subject: Re: [Ce-android-mainline] [RFC][PATCH 1/1] gpu: ion: Add IOMMU heap allocator with IOMMU API
Date: Thu, 19 Jan 2012 17:27:19 +0100
Message-ID: <201201191627.19732.arnd at arndb.de>
> On Thursday 19 January 2012, Hiroshi Doyu wrote:
>
> > > The main advantage of the IOMMU API is that it is able to separate
> > > multiple contexts, per user, so one application that is able to
> > > program a DMA engine cannot access memory that another application
> > > has mapped. Is that what you do with ion_iommu_heap_create(), i.e.
> > > that it gets called for each GPU context?
> >
> > Yes. The concept of this part cames from dma-mapping API code.
>
> I don't understand. The dma-mapping code on top of IOMMU does *not*
> use one iommu context per GPU context, it uses one IOMMU context
> for everything together.
Tegra30 has a IOMMU("SMMU") which has 4 ASID(domain)(*1) and currently
each "smmu_iommu_domain_init()" call allocates a
ASID. "smmu_iommu_attach_dev()" assigns a client device to an
allocated domain. For example, I think that the following "dev[i]"
could be per GPU/device context(?), and our SMMU "iommu_ops" supports
multiple device to be attached to a domain. This is configurable in SMMU.
for (i = 0; i < 4; i++) {
map[i] = arm_iommu_create_mapping();
arm_iommu_attach_device(&dev[i], map[i]);
}
Here,
map[i] == asid == domain
dev[i] == client
So I thought that theoretically DMA mapping API could support one
iommu context(domain) per GPU/device context. 1-to-1 and 1-to-n
too. Also there's the simple version of DMA mapping test(*2).
Anyway, my previous answer was wrong. ion_iommu_heap does NOT support
multiple contexts but only single right now. It needs something more
to support multiple context.
*1 https://lkml.org/lkml/2012/1/5/27
*2 https://lkml.org/lkml/2012/1/5/28
> > > >+
> > > >+static int ion_buffer_allocate(struct ion_buffer *buf)
> > > >+{
> > > >+ int i, npages = NUM_PAGES(buf);
> > > >+
> > > >+ buf->pages = kmalloc(npages * sizeof(*buf->pages), GFP_KERNEL);
> > > >+ if (!buf->pages)
> > > >+ goto err_pages;
> > > >+
> > > >+ buf->sglist = vzalloc(npages * sizeof(*buf->sglist));
> > > >+ if (!buf->sglist)
> > > >+ goto err_sgl;
> > >
> > > Using vzalloc here probably goes a bit too far, it's fairly expensive
> > > to use compared with kzalloc, not only do you have to maintain the
> > > page table entries for the new mapping, it also means you use up
> > > precious space in the vmalloc area and have to use small pages for
> > > accessing the data in it.
> >
> > Should I use "kzalloc()" instead here?
>
> Yes, that would be better, at least if you can prove that there is
> a reasonable upper bound on the size of the allocation. kmalloc/kzalloc
> usually work ok up to 16kb or at most 128kb. If you think the total
> allocation might be larger than that, you have to use vzalloc anyway
> but that should come with a long comment explaining what is going
> on.
>
> More likely if you end up needing vzalloc because of the size of the
> allocation, you should see that as a sign that you are doing something
> wrong and you should use larger chunks than single allocations in
> order to cut down the number of sglist entries.
>
> Another option would be to just use sg_alloc_table(), which was
> made for this exact purpuse ;-)
Good to know;)
> > > >+static void *iommu_heap_map_kernel(struct ion_heap *heap,
> > > >+ struct ion_buffer *buf)
> > > >+{
> > > >+ int npages = NUM_PAGES(buf);
> > > >+
> > > >+ BUG_ON(!buf->pages);
> > > >+ buf->vaddr = vm_map_ram(buf->pages, npages, -1,
> > > >+ pgprot_noncached(pgprot_kernel));
> > > >+ pr_debug("va:%p\n", buf->vaddr);
> > > >+ WARN_ON(!buf->vaddr);
> > > >+ return buf->vaddr;
> > > >+}
> > >
> > > You can't do this on ARMv6 or ARMv7: There is already a cacheable
> > > mapping of all entries in buf->pages, so you must not install
> > > a second noncached mapping. Either make sure you have only
> > > noncached pages, or only cached ones.
> >
> > Ok, any example to maintain both mappings?
>
> The dma_mapping implementation on ARM takes care of this by unmapping
> any allocation that goes through dma_alloc_coherent() from the linear
> mapping, at least in the CMA version that Marek did.
> In order to take advantage of this, you either need to call
> dma_alloc_coherent on a kernel that provides CMA, or call the CMA
> allocator directly (not generally recommended). Without CMA support,
> your only option is to use memory that was never part of the
> linear kernel mapping, but that would remove the main advantage of
> your patch, which is aimed at removing the need to take out memory
> from the linear mapping.
>
> Using only cacheable memory would be valid here, but is probably
> not desired for performance reasons, or even allowed in ION because
> it requires explicit cache management functions for flushing
> the caches while handing data back and forth between software
> and the device.
>
> Arnd
More information about the Ce-android-mainline
mailing list