[RFC] virtio-iommu version 0.5

Linu Cherian linu.cherian at cavium.com
Wed Oct 25 09:26:30 UTC 2017


Hi Jean,

On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote:
> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> > 
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf, at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report translation faults to
> >>>>>   the driver. For the moment only unrecoverable faults are available but
> >>>>>   future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV
> >>>>>   properties.
> >>>>> * Rename "address space" to "domain". The change might seem futile but
> >>>>>   allows to introduce PASIDs and other features cleanly in the next
> >>>>>   versions. In the same vein, the few remaining "device" occurrences were
> >>>>>   replaced by "endpoint", to avoid any confusion with "the device"
> >>>>>   referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions
> >>>>> I'll focus on optimizations and adding support for hardware acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly be optimized, even
> >>>>> without architectural changes. But the architecture itself can also be
> >>>>> improved in a number of ways. Currently it is designed to work well with
> >>>>> VFIO. However, having explicit MAP requests is less efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current architecture doesn't
> >>>>> address this. Binding page tables is an obvious way to improve throughput
> >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and driver merged, then
> >>>>> we'll analyze and compare several ideas for improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be interested in any
> >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> >>> and adding requests in pretty much the same format to virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> >>>>    driver need to create stage 1 page table as required by hardware which is not the case now. 
> >>>>    CMIIW. 
> >>>
> >>> The virtio-iommu device advertises which PASID/page table format is
> >>> supported by the host (obtained via sysfs and communicated in the PROBE
> >>> request), then the guest binds page tables or PASID tables to a domain and
> >>> populates it. Binding page tables alone is easy because we already have
> >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> >>> in the host to manage PASID tables. But since the PASID table pointer is
> >>> translated by stage-2, it would requires a little more work in the host
> >>> for obtaining GPA buffers from the guest on demand.
> >>   Is this for resolving PCI PRI requests ?. 
> >>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
> >>   by guest itself.
> 
> Supporting PCI PRI is a separate problem, that will be implemented by
> extending the event queue proposed in v0.5. Once the guest bound the PASID
> table and created the page tables, it will start some DMA job in the
> device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
> fault) to its driver, which is relayed to userspace by VFIO, then to the
> guest via virtio-iommu. The guest handles the fault, then sends a PRI
> response on the virtio-iommu request queue, relayed to the pIOMMU driver
> via VFIO and the device retries the access.
> 
> >>  In addition the BIND
> >>> ioctl is different from the one used by VT-d, so this solution didn't get
> >>> much appreciation.
> >>
> >> Could you please share the links on this ?
> 
> Please find the latest discussion at
> https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20189.html
> 
> >>> The alternative is to bind PASID tables. 
> >>
> >> Sorry, i didnt get the difference here.
> 
> PASID table is what we call Context Table in SMMU, it's the array
> associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
> stream table entry (device descriptor) points to a PASID table. Each
> context descriptor in the PASID table points to a page directory (pgd).
> 
> So the first solution was for the guest to send a BIND with pasid+pgd, and
> let the host deal with the context tables. The second solution is to send
> a BIND with a PASID table pointer, and have the guest handle the context
> table.
> 
> > Also does this solution intend to cover the page table sharing of non SVM 
> > cases. For example, if we need to share the IOMMU page table for 
> > a device used in guest kernel, so that map/unmap gets directly handled by the guest
> > and only TLB invalidates happens through a virtio-iommu channel.
> 
> Yes for non-SVM in SMMuv3, you still have a context table but with a
> single descriptor, so the interface stays the same. But with the second
> solution, nested with SMMUv2 isn't supported since it doesn't have context
> tables. The second solution was considered simpler to implement, so we'll
> first go with this one.
> 
> Thanks,
> Jean
>

Thanks a lot for the pointers and the explanation.

 
> >> It requires to factor the guest
> >>> PASID handling code into a library, which is difficult for SMMU. Luckily
> >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of
> >>> the driver isn't a big overhead. The good thing about this solution is
> >>> that it reuses any specification work done for VFIO (and vice versa) and
> >>> any host driver change made for vSMMU/VT-d emulations.
> >>>
> >>> Thanks,
> >>> Jean
> >>
> >> -- 
> >> Linu cherian
> > 

-- 
Linu cherian


More information about the iommu mailing list