[Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation

Joerg Roedel joro at 8bytes.org
Mon May 12 15:03:46 UTC 2014


On Fri, May 09, 2014 at 07:05:10PM +0100, Will Deacon wrote:
> On Thu, May 08, 2014 at 01:37:03PM +0100, David Woodhouse wrote:
> > We may have various options for shutting it up — a PCI function level
> > reset, power cycling the offending device, or maybe just configuring the
> > IOMMU to *ignore* further errors from it, which would at least let the
> > system get on with doing something useful (and if we do, when do we
> > re-enable reporting?).
> 
> There's also the fun of non-PCI devices, where even if you can kill the
> offending device, there's not a specified way to ensure that it not longer
> has transactions in flight. Also, the fault reports have to go somewhere,
> so queues can fill up etc. etc.

I am of course also interested in this discussion. Fault handling is
currently implemented per IOMMU driver. There is no reason we should not
unify the way we report faults and handle misbehaving devices.

> I'd certainly be interested in this from the ARM side (I'm involved in the
> architecture of our next SMMU and we've discussed this a lot internally).

Interesting. I strongly hope the next SMMU will still work with the
current in-kernel SMMU driver :)


	Joerg




More information about the Ksummit-discuss mailing list