[PATCH V3 1/3] iommu: Add support to change default domain of an iommu_group

Prakhya, Sai Praneeth sai.praneeth.prakhya at intel.com
Fri May 29 08:15:03 UTC 2020


Hi Baolu,

> -----Original Message-----
> From: Lu Baolu <baolu.lu at linux.intel.com>
> Sent: Thursday, May 28, 2020 7:43 PM
> To: Prakhya, Sai Praneeth <sai.praneeth.prakhya at intel.com>;
> iommu at lists.linux-foundation.org
> Cc: baolu.lu at linux.intel.com; Christoph Hellwig <hch at lst.de>; Joerg Roedel
> <joro at 8bytes.org>; Raj, Ashok <ashok.raj at intel.com>; Will Deacon
> <will.deacon at arm.com>; Mehta, Sohil <sohil.mehta at intel.com>; Robin
> Murphy <robin.murphy at arm.com>; Jacob Pan <jacob.jun.pan at linux.intel.com>
> Subject: Re: [PATCH V3 1/3] iommu: Add support to change default domain of
> an iommu_group
> 
> Hi Sai,
> 
> On 5/29/20 3:23 AM, Sai Praneeth Prakhya wrote:
> > Presently, the default domain of an iommu_group is allocated during
> > boot time (i.e. when a device is being added to a group) and it cannot
> > be
>               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This is inaccurate as Joerg's code has deferred default domain allocation and
> attaching after group allocation. I'd suggest to remove this.

Ok.. makes sense. I will remove it. I think it should have been like below to accurately describe Joerg's changes, is that correct?

"Presently, the default domain of an iommu_group is allocated during
boot time (i.e. during device probe and after the device is added to a group) and it cannot
be"
 
> > changed later. So, the device would typically be either in identity
> > (also known as pass_through) mode (controlled by "iommu=pt" kernel
> > command line
>                                                ^^^^^^^^
> 
> There are other kernel parameters to put device in pass_through mode.
> I'd suggest to remove this.

Ok.. makes sense. I will remove it.
I think, you were talking about ARM parameters (iommu.passthrough).. am I right?

> > argument) or the device would be in DMA mode as long as the machine is
> > up and running. There is no way to change the default domain type
> > dynamically i.e. after booting, a device cannot switch between
> > identity mode and DMA mode.
> >
> > But, assume a use case wherein the user trusts the device and believes
> > that the OS is secure enough and hence wants *only* this device to
> > bypass IOMMU (so that it could be high performing) whereas all the
> > other devices to go through IOMMU (so that the system is protected).
> > Presently, this use case is not supported. It will be helpful if there
> > is some way to change the default domain of a B:D.F dynamically. Hence, add
> such support.
>    ^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Currently default domain is per iommu_group, we have no per device default
> domain yet. Probably, "default domain of an iommu group"?

Makes sense. I will change it.

> > A privileged user could request the kernel to change the default
> > domain type of a iommu_group by writing to
> > "/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three
> > values are supported 1. identity: all the DMA transactions from the
> > device in this group are
> >               *not* translated by the iommu 2. DMA: all the DMA
> > transactions from the device in this group are
> >          translated by the iommu
> > 3. auto: change to the type the device was booted with
> >
> > Note:
> > 1. Default domain of an iommu group with two or more devices cannot be
> >     changed.
> > 2. The device in the iommu group shouldn't be bound to any driver.
> > 3. The device shouldn't be assigned to user for direct access.
> >
> > Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for
> > more information.
> >
> > Cc: Christoph Hellwig <hch at lst.de>
> > Cc: Joerg Roedel <joro at 8bytes.org>
> > Cc: Ashok Raj <ashok.raj at intel.com>
> > Cc: Will Deacon <will.deacon at arm.com>
> > Cc: Lu Baolu <baolu.lu at linux.intel.com>
> > Cc: Sohil Mehta <sohil.mehta at intel.com>
> > Cc: Robin Murphy <robin.murphy at arm.com>
> > Cc: Jacob Pan <jacob.jun.pan at linux.intel.com>
> > Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya at intel.com>
> > ---
> >   drivers/iommu/iommu.c | 211
> +++++++++++++++++++++++++++++++++++++++++-
> >   1 file changed, 210 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > a4c2f122eb8b..2b6cca799055 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -92,6 +92,8 @@ static void __iommu_detach_group(struct
> iommu_domain *domain,
> >   static int iommu_create_device_direct_mappings(struct iommu_group
> *group,
> >   					       struct device *dev);
> >   static struct iommu_group *iommu_group_get_for_dev(struct device
> > *dev);
> > +static ssize_t iommu_group_store_type(struct iommu_group *group,
> > +				      const char *buf, size_t count);
> >
> >   #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)
> 	\
> >   struct iommu_group_attribute iommu_group_attr_##_name =		\
> > @@ -524,7 +526,8 @@ static IOMMU_GROUP_ATTR(name, S_IRUGO,
> iommu_group_show_name, NULL);
> >   static IOMMU_GROUP_ATTR(reserved_regions, 0444,
> >   			iommu_group_show_resv_regions, NULL);
> >
> > -static IOMMU_GROUP_ATTR(type, 0444, iommu_group_show_type, NULL);
> > +static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
> > +			iommu_group_store_type);
> >
> >   static void iommu_group_release(struct kobject *kobj)
> >   {
> > @@ -2847,3 +2850,209 @@ int iommu_sva_get_pasid(struct iommu_sva
> *handle)
> >   	return ops->sva_get_pasid(handle);
> >   }
> >   EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> > +
> > +/*
> > + * Changes the default domain of a device
> > + *
> > + * @dev: The *only* device in the group
> 
> All devices in a group have been linked in group->devices, so you don't need to
> pass it in as a parameter. Otherwise, you also need some sanity check against
> @dev, right?

Yes, I agree that we could get "dev" from group->devices. But, I passed it as a parameter because it's already done by iommu_group_store_type() (as below) and I thought that I could save from duplicating code by passing it as a parameter. Also, iommu_change_dev_def_domain() is presently just a helper function (and not an API) for iommu_group_store_type() and hence thought that error checking for "dev" (i.e. if (!dev)) is not needed in iommu_change_dev_def_domain() because iommu_group_store_type()  may never pass a null value.

/* Since group has only one device */
list_for_each_entry(grp_dev, &group->devices, list)
	dev = grp_dev->dev;

Just wanted to put out the rationale behind why I decided to go this way rather than getting "dev" from group->devices. Please let me know if you think otherwise, I am happy to change it.

> > + * @group: The group for which the default domain should be changed
> > + * @prev_domain: The previous domain that is being switched from
> 
> The same as @dev, the default domain in use is saved in
> group->default_domain. No need a separated parameter to save space and
> sanity check.

Same as above. Just wanted to avoid the below check as it's already done by iommu_group_store_type()
if (group->default_domain)
	prev_dom = group->default_domain;

I am happy to change it if you prefer to see less parameters.

> > + * @type: The type of the new default domain that gets associated
> > + with the group
> > + *
> > + * Returns 0 on success and error code on failure
> > + *
> > + * Note:
> > + * 1. Presently, this function is called only when user requests to change the
> > + *    group's default domain type through
> /sys/kernel/iommu_groups/<grp_id>/type
> > + *    Please take a closer look if intended to use for other purposes.
> > + * 2. Assumes that group->mutex is already taken and releases before
> > + returning
> 
> You could assume that the caller should hold the group->mutex and shouldn't be
> released in this helper. A reference design could be found here.
> 
> https://lkml.org/lkml/2020/3/13/1164

Ok. I have looked at it.. and it does hold group mutex.
Just wondering if you would like me to change anything here?

> > + */
> > +static int iommu_change_dev_def_domain(struct device *dev,
> > +				       struct iommu_group *group,
> > +				       struct iommu_domain *prev_dom, int type)
> {
> > +	int ret = 0;
> > +
> > +	/* Sets group->default_domain to the newly allocated domain */
> > +	ret = iommu_group_alloc_default_domain(dev->bus, group, type);
> > +	if (ret)
> > +		goto out;
> > +
> > +	ret = __iommu_attach_device(group->default_domain, dev);
> > +	if (ret)
> > +		goto free_new_domain;
> > +
> > +	group->domain = group->default_domain;
> > +
> > +	ret = iommu_create_device_direct_mappings(group, dev);
> > +	if (ret)
> > +		goto free_new_domain;
> > +
> > +	/*
> > +	 * Release the mutex here because ops->probe_finalize() call-back of
> > +	 * some vendor IOMMU drivers calls arm_iommu_attach_device() which
> > +	 * in-turn might call back into IOMMU core code, where it tries to take
> > +	 * group->mutex, resulting in a deadlock.
> > +	 */
> > +	mutex_unlock(&group->mutex);
> > +
> > +	/* Make sure dma_ops is appropriatley set */
> > +	iommu_group_do_probe_finalize(dev, group->default_domain);
> > +	iommu_domain_free(prev_dom);
> > +	return 0;
> > +
> > +free_new_domain:
> > +	iommu_domain_free(group->default_domain);
> > +	group->default_domain = prev_dom;
> > +	group->domain = prev_dom;
> > +
> > +out:
> > +	mutex_unlock(&group->mutex);
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t iommu_group_store_type(struct iommu_group *group,
> > +				      const char *buf, size_t count) {
> > +	int ret, req_type = 0, req_auto = 0, dev_def_dom;
> > +	struct iommu_domain *prev_dom;
> > +	struct group_device *grp_dev;
> > +	const struct iommu_ops *ops;
> > +	struct device *dev;
> > +
> > +	if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
> > +		return -EACCES;
> > +
> > +	if (WARN_ON(!group))
> > +		return -EINVAL;
> > +
> > +	if (sysfs_streq(buf, "identity"))
> > +		req_type = IOMMU_DOMAIN_IDENTITY;
> > +	else if (sysfs_streq(buf, "DMA"))
> > +		req_type = IOMMU_DOMAIN_DMA;
> > +	else if (sysfs_streq(buf, "auto"))
> > +		req_auto = 1;
> 
> How about let req_type = 0 for auto case. This keeps it consistent with the
> return value of iommu_ops->def_domain_type(). With
> 
> if (!req_type)
> 	req_type = ops->def_domain_type(dev);
> 
> you don't need req_auto and dev_def_dom any more.

Please see below where you had comments for changing req_type.

> > +	else
> > +		return -EINVAL;
> > +
> > +	/* Lock the group to make sure that the device count doesn't change */
> > +	mutex_lock(&group->mutex);
> > +	if (iommu_group_device_count(group) != 1) {
> > +		mutex_unlock(&group->mutex);
> > +		pr_err("Cannot change default domain of a group with two or
> more devices\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Since group has only one device */
> > +	list_for_each_entry(grp_dev, &group->devices, list)
> > +		dev = grp_dev->dev;
> > +
> > +	/*
> > +	 * Don't hold onto the group mutex because taking group mutex first
> and
> > +	 * then the device lock could potentially cause a deadlock as below.
> > +	 * Assume two threads T1 and T2. T1 is trying to change default domain
> > +	 * of an iommu group and T2 is trying to hot unplug a device or release
> > +	 * [1] VF of a PCIe device which is in the same iommu group. T1 takes
> > +	 * group mutex and before it could take device lock assume T2 has taken
> > +	 * device lock and is yet to take group mutex. Now, both the threads
> > +	 * will be waiting for the other thread to release lock.
> > +	 *
> > +	 * [1] Typical device release path
> > +	 * device_lock() from device/driver core code
> > +	 *  -> bus_notifier()
> > +	 *   -> iommu_bus_notifier()
> > +	 *    -> iommu_release_device()
> > +	 *     -> ops->release_device() vendor driver calls back iommu core code
> > +	 *      -> mutex_lock() from iommu core code
> > +	 */
> > +	mutex_unlock(&group->mutex);
> 
> Since you only support single device group, how about
> 
> -> lock_device
>      -> lock group
>          -> change default domain
>      -> unlock group
> -> unlock device

Sure! I did the same thing below i.e. first take device_lock() and then take group mutex.
I acquired/released group mutex above to
1.  Verify that iommu group has only one device
2. Get "struct dev *" of that device.
Just wondering if you would like me to change anything here?

> > +
> > +	/* Check if the device in the group still has a driver bound to it */
> > +	device_lock(dev);
> > +	if (device_is_bound(dev)) {
> > +		pr_err("Device is still bound to driver\n");
> > +		ret = -EBUSY;
> > +		goto dev_unlock;
> > +	}
> > +
> > +	/*
> > +	 * iommu group wasn't locked while acquiring device lock. So, make
> sure
> > +	 * that the device count hasn't changed while acquiring device lock.
> > +	 *
> > +	 * Changing default domain of an iommu group with two or more
> devices
> > +	 * isn't supported because there could be a potential deadlock. Consider
> > +	 * the following scenario. T1 is trying to acquire device locks of all
> > +	 * the devices in the group and before it could acquire all of them,
> > +	 * there could be another thread T2 (from different sub-system and use
> > +	 * case) that has already acquired some of the device locks and might be
> > +	 * waiting for T1 to release other device locks.
> > +	 */
> > +	mutex_lock(&group->mutex);
> > +	if (iommu_group_device_count(group) != 1) {
> > +		pr_err("Cannot change default domain of a group with two or
> more devices\n");
> > +		ret = -EINVAL;
> > +		goto group_unlock;
> > +	}
> > +
> > +	/*
> > +	 * Check if any user level driver (that doesn't use kernel driver like
> > +	 * VFIO) is directly using the group. VFIO use case is detected by
> > +	 * device_is_bound() check above
> > +	 */
> > +	if (group->default_domain != group->domain) {
> > +		pr_err("Group assigned to user level for direct access\n");
> > +		ret = -EINVAL;
> 
> -EBUSY?

Ok.. I will change it.

> > +		goto group_unlock;
> > +	}
> > +
> > +	prev_dom = group->default_domain;
> > +	if (!prev_dom || !prev_dom->ops || !prev_dom->ops-
> >def_domain_type) {
> > +		pr_err("'def_domain_type' call back isn't registered\n");
> > +		ret = -EINVAL;
> > +		goto group_unlock;
> > +	}
> > +
> 
> --- start ---
> > +	ops = prev_dom->ops;
> > +	dev_def_dom = ops->def_domain_type(dev);
> > +
> > +	/* Check if user requested domain is supported by the device or not */
> > +	if (!req_auto && dev_def_dom && req_type != dev_def_dom) {
> > +		pr_err("Device cannot be in %s domain\n", buf);
> > +		ret = -EINVAL;
> > +		goto group_unlock;
> > +	}
> > +
> > +	if (req_auto) {
> > +		/*
> > +		 * If user requested "auto" and the device supports both the
> > +		 * domains, then default to the domain the device was booted
> > +		 * with
> > +		 */
> > +		if (!dev_def_dom)
> > +			req_type = iommu_def_domain_type;
> > +		else
> > +			req_type = dev_def_dom;
> > +	}
> --- end ---
> 
> Isn't this could could be simply replaced with:
> 
> if (!req_type)
> 	req_type = ops->default_domain_type(dev) ? :
> iommu_def_domain_type;

I think, replacing this has two issues
1. We might miss validating user request i.e. we need to make sure that when user requests for a specific type of domain (E.g: DMA), we need to validate that the device could actually be in that domain. This is done above by

/* Check if user requested domain is supported by the device or not */
if (!req_auto && dev_def_dom && req_type != dev_def_dom) {
	pr_err("Device cannot be in %s domain\n", buf);
	ret = -EINVAL;
	goto group_unlock;
}

2. req_type could end up having 0 if "ops->default_domain_type(dev)" returns 0.
But, after this point, req_type should _only_ be either DMA type or identity type and cannot be 0 because req_type is passed to iommu_ops->domain_alloc() to allocate domain of requested type and it cannot be 0.

Regards,
Sai


More information about the iommu mailing list