[RFC] virtio-iommu version 0.5

Linu Cherian linu.cherian at cavium.com
Wed Oct 25 07:07:04 UTC 2017


Hi Jean,

On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> Hi Jean,
> Thanks for your reply.
> 
> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote:
> > Hi Linu,
> > 
> > On 24/10/17 07:27, Linu Cherian wrote:
> > > Hi Jean,
> > > 
> > > On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote:
> > >> This is version 0.5 of the virtio-iommu specification, the paravirtualized
> > >> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue.
> > >> Please find the specification, LaTeX sources and pdf, at:
> > >> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> > >>
> > >> A detailed changelog since v0.4 follows. You can find the pdf diff at:
> > >> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> > >>
> > >> * Add an event virtqueue for the device to report translation faults to
> > >>   the driver. For the moment only unrecoverable faults are available but
> > >>   future versions will extend it.
> > >> * Simplify PROBE request by removing the ack part, and flattening RESV
> > >>   properties.
> > >> * Rename "address space" to "domain". The change might seem futile but
> > >>   allows to introduce PASIDs and other features cleanly in the next
> > >>   versions. In the same vein, the few remaining "device" occurrences were
> > >>   replaced by "endpoint", to avoid any confusion with "the device"
> > >>   referring to the virtio device across the document.
> > >> * Add implementation notes for RESV_MEM properties.
> > >> * Update ACPI table definition.
> > >> * Fix typos and clarify a few things.
> > >>
> > >> I will publish the Linux driver for v0.5 shortly. Then for next versions
> > >> I'll focus on optimizations and adding support for hardware acceleration.
> > >>
> > >> Existing implementations are simple and can certainly be optimized, even
> > >> without architectural changes. But the architecture itself can also be
> > >> improved in a number of ways. Currently it is designed to work well with
> > >> VFIO. However, having explicit MAP requests is less efficient* than page
> > >> tables for emulated and PV endpoints, and the current architecture doesn't
> > >> address this. Binding page tables is an obvious way to improve throughput
> > >> in that case, but we can explore cleverer (and possibly simpler) ways to
> > >> do it.
> > >>
> > >> So first we'll work on getting the base device and driver merged, then
> > >> we'll analyze and compare several ideas for improving performance.
> > >>
> > >> Thanks,
> > >> Jean
> > >>
> > >> * I have yet to study this behaviour, and would be interested in any
> > >> prior art on the subject of analyzing devices DMA patterns (virtio and
> > >> others)
> > > 
> > > 
> > > From the spec,
> > > Under future extensions.
> > > 
> > > "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU"
> > > 
> > > Had few questions on this.
> > > 
> > > 1. Did you mean SVM support for vfio-pci devices attached to guest processes here.
> > 
> > Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on,
> > and adding requests in pretty much the same format to virtio-iommu.
> > 
> > > 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel 
> > >    driver need to create stage 1 page table as required by hardware which is not the case now. 
> > >    CMIIW. 
> > 
> > The virtio-iommu device advertises which PASID/page table format is
> > supported by the host (obtained via sysfs and communicated in the PROBE
> > request), then the guest binds page tables or PASID tables to a domain and
> > populates it. Binding page tables alone is easy because we already have
> > the required drivers in the guest (io-pgtable or arch/* for SVM) and code
> > in the host to manage PASID tables. But since the PASID table pointer is
> > translated by stage-2, it would requires a little more work in the host
> > for obtaining GPA buffers from the guest on demand.
>   Is this for resolving PCI PRI requests ?. 
>   IIUC, PCI PRI requests for devices owned by guest need to be resolved
>   by guest itself.
> 
> 
>  In addition the BIND
> > ioctl is different from the one used by VT-d, so this solution didn't get
> > much appreciation.
> 
> Could you please share the links on this ?
> 
> > 
> > The alternative is to bind PASID tables. 
> 
> Sorry, i didnt get the difference here.
>

Also does this solution intend to cover the page table sharing of non SVM 
cases. For example, if we need to share the IOMMU page table for 
a device used in guest kernel, so that map/unmap gets directly handled by the guest
and only TLB invalidates happens through a virtio-iommu channel.
 
> It requires to factor the guest
> > PASID handling code into a library, which is difficult for SMMU. Luckily
> > I'm still working on adding PASID code for SMMUv3, so extracting it out of
> > the driver isn't a big overhead. The good thing about this solution is
> > that it reuses any specification work done for VFIO (and vice versa) and
> > any host driver change made for vSMMU/VT-d emulations.
> > 
> > Thanks,
> > Jean
> 
> -- 
> Linu cherian

-- 
Linu cherian


More information about the iommu mailing list