[PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Jacob Pan jacob.jun.pan at linux.intel.com
Tue Oct 29 04:11:21 UTC 2019


On Tue, 29 Oct 2019 10:54:48 +0800
Lu Baolu <baolu.lu at linux.intel.com> wrote:

> Hi,
> 
> On 10/29/19 6:29 AM, Jacob Pan wrote:
> > Hi Baolu,
> > 
> > Appreciate the thorough review, comments inline.  
> 
> You are welcome.
> 
> > 
> > On Sat, 26 Oct 2019 10:01:19 +0800
> > Lu Baolu <baolu.lu at linux.intel.com> wrote:
> >   
> >> Hi,
> >>  
> 
> [...]
> 
> >>> +			 * allow multiple bind calls with the
> >>> same PASID and pdev.
> >>> +			 */
> >>> +			sdev->users++;
> >>> +			goto out;
> >>> +		}  
> >>
> >> I remember I ever pointed this out before. But I forgot how we
> >> addressed it. So forgive me if this has been addressed.
> >>
> >> What if we have a valid bound svm but @dev doesn't belong to it
> >> (a.k.a. @dev not in svm->devs list)?
> >>  
> > If we are binding a new device to an existing/active PASID, the code
> > will allocate a new sdev and add that to the svm->devs list.  
> 
> But allocating a new sdev and adding device is in below else branch,
> so it will never reach there, right?
> 
No, allocating sdev is outside else branch.
> >>> +	} else {
> >>> +		/* We come here when PASID has never been bond to
> >>> a device. */
> >>> +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> >>> +		if (!svm) {
> >>> +			ret = -ENOMEM;
> >>> +			goto out;
> >>> +		}
> >>> +		/* REVISIT: upper layer/VFIO can track host
> >>> process that bind the PASID.
> >>> +		 * ioasid_set = mm might be sufficient for vfio
> >>> to check pasid VMM
> >>> +		 * ownership.
> >>> +		 */
> >>> +		svm->mm = get_task_mm(current);
> >>> +		svm->pasid = data->hpasid;
> >>> +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> >>> +			svm->gpasid = data->gpasid;
> >>> +			svm->flags |= SVM_FLAG_GUEST_PASID;
> >>> +		}
> >>> +		ioasid_set_data(data->hpasid, svm);
> >>> +		INIT_LIST_HEAD_RCU(&svm->devs);
> >>> +		INIT_LIST_HEAD(&svm->list);
> >>> +
> >>> +		mmput(svm->mm);
> >>> +	}  
> >>
> >> A blank line, please.  
> > looks good.  
> >>  
> >>> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> >>> +	if (!sdev) {
> >>> +		if (list_empty(&svm->devs))
> >>> +			kfree(svm);  
> >>
> >> This is dangerous. This might leave a wild pointer bound with
> >> gpasid. 
> > why is that? can you please explain?
> > if the list is empty that means we just allocated the new svm, no
> > users. why can't we free it here?  
> 
> svm has been associated with the pasid private data. It needs to be
> unbound from pasid before getting freed. Otherwise, a wild pointer
> will be left.
> 
> 	ioasid_set_data(pasid, NULL);
> 	kfree(svm);
> 
Right, I need to clear the private data here. Thanks!

> >   
> >>> +		ret = -ENOMEM;
> >>> +		goto out;
> >>> +	}
> >>> +	sdev->dev = dev;
> >>> +	sdev->users = 1;
> >>> +
> >>> +	/* Set up device context entry for PASID if not enabled
> >>> already */
> >>> +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> >>> +	if (ret) {
> >>> +		dev_err(dev, "Failed to enable PASID
> >>> capability\n");
> >>> +		kfree(sdev);
> >>> +		goto out;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * For guest bind, we need to set up PASID table entry as
> >>> follows:
> >>> +	 * - FLPM matches guest paging mode
> >>> +	 * - turn on nested mode
> >>> +	 * - SL guest address width matching
> >>> +	 */
> >>> +	ret = intel_pasid_setup_nested(iommu,
> >>> +				dev,
> >>> +				(pgd_t *)data->gpgd,
> >>> +				data->hpasid,
> >>> +				&data->vtd,
> >>> +				ddomain,
> >>> +				data->addr_width);
> >>> +	if (ret) {
> >>> +		dev_err(dev, "Failed to set up PASID %llu in
> >>> nested mode, Err %d\n",
> >>> +			data->hpasid, ret);  
> >>
> >> This error handling is insufficient. You should at least:
> >>
> >> 1. free sdev  
> > already done below
> >   
> >> 2. if list_empty(&svm->devs)
> >> 	unbound the svm from gpasid
> >> 	free svm
> >>  
> > yes, agreed.
> >   
> >> The same for above error handling. Add a branch for error recovery
> >> at the end of function might help here.
> >>  
> > not sure which code is the same as above? could you point it out?  
> 
> Above last comment. :-)
> 
Got it.
> >>> +		kfree(sdev);
> >>> +		goto out;
> >>> +	}
> >>> +	svm->flags |= SVM_FLAG_GUEST_MODE;
> >>> +
> >>> +	init_rcu_head(&sdev->rcu);
> >>> +	list_add_rcu(&sdev->list, &svm->devs);
> >>> + out:
> >>> +	mutex_unlock(&pasid_mutex);
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> >>> +{
> >>> +	struct intel_svm_dev *sdev;
> >>> +	struct intel_iommu *iommu;
> >>> +	struct intel_svm *svm;
> >>> +	int ret = -EINVAL;
> >>> +
> >>> +	mutex_lock(&pasid_mutex);
> >>> +	iommu = intel_svm_device_to_iommu(dev);
> >>> +	if (!iommu)
> >>> +		goto out;  
> >>
> >> Make it symmetrical with bind function.
> >>
> >> 	if (WARN_ON(!iommu))
> >> 		goto out;
> >>  
> > sounds good.  
> >>> +
> >>> +	svm = ioasid_find(NULL, pasid, NULL);
> >>> +	if (IS_ERR_OR_NULL(svm)) {
> >>> +		ret = PTR_ERR(svm);  
> >>
> >> If svm == NULL, this function will return success. This is not
> >> expected, right?
> >>  
> > good catch, will fix.  
> >>> +		goto out;
> >>> +	}
> >>> +
> >>> +	for_each_svm_dev(svm, dev) {
> >>> +		ret = 0;
> >>> +		sdev->users--;
> >>> +		if (!sdev->users) {
> >>> +			list_del_rcu(&sdev->list);
> >>> +			intel_pasid_tear_down_entry(iommu, dev,
> >>> svm->pasid);
> >>> +			/* TODO: Drain in flight PRQ for the
> >>> PASID since it
> >>> +			 * may get reused soon, we don't want to
> >>> +			 * confuse with its previous life.
> >>> +			 * intel_svm_drain_prq(dev, pasid);
> >>> +			 */
> >>> +			kfree_rcu(sdev, rcu);
> >>> +
> >>> +			if (list_empty(&svm->devs)) {
> >>> +				list_del(&svm->list);
> >>> +				kfree(svm);
> >>> +				/*
> >>> +				 * We do not free PASID here
> >>> until explicit call
> >>> +				 * from VFIO to free. The PASID
> >>> life cycle
> >>> +				 * management is largely tied to
> >>> VFIO management
> >>> +				 * of assigned device life
> >>> cycles. In case of
> >>> +				 * guest exit without a explicit
> >>> free PASID call,
> >>> +				 * the responsibility lies in
> >>> VFIO layer to free
> >>> +				 * the PASIDs allocated for the
> >>> guest.
> >>> +				 * For security reasons, VFIO has
> >>> to track the
> >>> +				 * PASID ownership per guest
> >>> anyway to ensure
> >>> +				 * that PASID allocated by one
> >>> guest cannot be
> >>> +				 * used by another.
> >>> +				 */
> >>> +				ioasid_set_data(pasid, NULL);  
> >>
> >> Exchange order. First unbind svm from gpasid and then free svm.
> >>  
> > I am not following, aren't we already doing free svm after unbind?
> > please explain.  
> 
> I meant
> 
> 	ioasid_set_data(pasid, NULL);
> 	kfree(svm);
> 
> in reverse order, it leaves a short window when svm is freed, but
> pasid private data is still kept svm (wild pointer).
> 
> 
Right. will fix
> >>> +			}
> >>> +		}
> >>> +		break;
> >>> +	}
> >>> + out:
> >>> +	mutex_unlock(&pasid_mutex);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>    int intel_svm_bind_mm(struct device *dev, int *pasid, int
> >>> flags, struct svm_dev_ops *ops) {
> >>>    	struct intel_iommu *iommu =
> >>> intel_svm_device_to_iommu(dev); diff --git
> >>> a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> >>> 3dba6ad3e9ad..6c74c71b1ebf 100644 ---
> >>> a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h
> >>> @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct
> >>> intel_iommu *iommu, struct device *dev); int
> >>> intel_svm_init(struct intel_iommu *iommu); extern int
> >>> intel_svm_enable_prq(struct intel_iommu *iommu); extern int
> >>> intel_svm_finish_prq(struct intel_iommu *iommu); -
> >>> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> >>> +		struct device *dev, struct iommu_gpasid_bind_data
> >>> *data); +extern int intel_svm_unbind_gpasid(struct device *dev,
> >>> int pasid); struct svm_dev_ops;
> >>>    
> >>>    struct intel_svm_dev {
> >>> @@ -690,9 +692,13 @@ struct intel_svm_dev {
> >>>    struct intel_svm {
> >>>    	struct mmu_notifier notifier;
> >>>    	struct mm_struct *mm;
> >>> +
> >>>    	struct intel_iommu *iommu;
> >>>    	int flags;
> >>>    	int pasid;
> >>> +	int gpasid; /* Guest PASID in case of vSVA bind with
> >>> non-identity host
> >>> +		     * to guest PASID mapping.
> >>> +		     */
> >>>    	struct list_head devs;
> >>>    	struct list_head list;
> >>>    };
> >>> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> >>> index 94f047a8a845..a2c189ad0b01 100644
> >>> --- a/include/linux/intel-svm.h
> >>> +++ b/include/linux/intel-svm.h
> >>> @@ -44,6 +44,23 @@ struct svm_dev_ops {
> >>>     * do such IOTLB flushes automatically.
> >>>     */
> >>>    #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> >>> +/*
> >>> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> >>> to a device.
> >>> + * In this case the mm_struct is in the guest kernel or
> >>> userspace, its life
> >>> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> >>> API provides
> >>> + * means to bind/unbind guest CR3 with PASIDs allocated for a
> >>> device.
> >>> + */
> >>> +#define SVM_FLAG_GUEST_MODE	(1<<2)  
> >>
> >> How about keeping this aligned with top by adding a tab?
> >>  
> > sounds good.  
> >> BIT macro is preferred. Hence, make it BIT(1), BIT(2), BIT(3) is
> >> preferred.
> >>  
> > I know, but the existing mainline code is not using BIT, so I wanted
> > to keep coding style consistent. Perhaps a separate cleanup patch
> > will do later.  
> 
> It makes sense to me.
> 
> >>> +/*
> >>> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> >>> PASID space,
> >>> + * which requires guest and host PASID translation at both
> >>> directions. We keep
> >>> + * track of guest PASID in order to provide lookup service to
> >>> device drivers.
> >>> + * One such example is a physical function (PF) driver that
> >>> supports mediated
> >>> + * device (mdev) assignment. Guest programming of mdev
> >>> configuration space can
> >>> + * only be done with guest PASID, therefore PF driver needs to
> >>> find the matching
> >>> + * host PASID to program the real hardware.
> >>> + */
> >>> +#define SVM_FLAG_GUEST_PASID	(1<<3)  
> >>
> >> Ditto.
> >>
> >> Best regards,
> >> baolu  
> > 
> > [Jacob Pan]
> >   
> 
> Best regards,
> baolu

[Jacob Pan]


More information about the iommu mailing list