[Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation

Andy Lutomirski luto at amacapital.net
Mon May 12 16:16:11 UTC 2014


On Mon, May 12, 2014 at 8:35 AM, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> On Mon, May 12, 2014 at 5:07 PM, Joerg Roedel <joro at 8bytes.org> wrote:
>> On Mon, May 12, 2014 at 12:43:09AM +0200, Daniel Vetter wrote:
>>> So I think having some iommu storm handling (like we have for
>>> interrupts in general and a lot of other things) would go a long way
>>> towards the goal of enabling iommus everywhere.
>>
>> Right, the developer use-case needs also be taken into account. We could
>> easily ignore a device after it did something wrong to get rid of
>> io-page-fault or interupt storms. But we also need a way to tell the
>> kernel to unignore the device later :)
>
> A disable/enable cycle of the pci bus master setting should be a good
> enough signal? Presuming you can say for sure which devices is doing
> the offending dma transactions ofc ... Or maybe we should just be
> optimists and re-enable the IOMMU if _any_ child device gets
> re-enabled (or bus master re-enabled for pci) in the hopes that the
> developers just reloaded the driver. Worst case the storm handling
> will kick in again shortly.

Just to check: are you talking about disabling the IOMMU if there's a
fault storm or disabling reporting of IOMMU faults?

--Andy

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss



-- 
Andy Lutomirski
AMA Capital Management, LLC


More information about the Ksummit-discuss mailing list