Advice on oops - memory trap on non-memory access instruction (invalid CR2?)

Guilherme G. Piccoli gpiccoli at canonical.com
Tue Oct 15 15:21:45 UTC 2019


On 14/10/2019 11:10, Thomas Gleixner wrote:
> On Mon, 14 Oct 2019, Guilherme G. Piccoli wrote:
>> Modules linked in: <...>
>> CPU: 40 PID: 78274 Comm: qemu-system-x86 Tainted: P W  OE
> 
> Tainted: P     - Proprietary module loaded ...
> 
> Try again without that module

Thanks Thomas, for the prompt response. This is some ScaleIO stuff, I
guess it's part of customer setup, and I agree would be better to not
have this kind of module loaded. Anyway, the analysis of oops show a
quite odd situation that we'd like to at least have a strong clue before
saying the scaleio stuff is the culprit.

> 
> Tainted: W     - Warning issued before
> 
> Are you sure that that warning is harmless and unrelated?
> 

Sorry I didn't mention that before, the warn is:

[5946866.593060] WARNING: CPU: 42 PID: 173056 at
/build/linux-lts-xenial-80t3lB/linux-lts-xenial-4.4.0/arch/x86/events/intel/core.c:1868
intel_pmu_handle_irq+0x2d4/0x470()
[5946866.593061] perfevents: irq loop stuck!

It happened ~700 days before the oops (yeah, the uptime is quite large,
about 900 days when the oops happened heh).


>> 4.4.0-45-generic #66~14.04.1-Ubuntu
> 
> Does the same problem happen with a not so dead kernel? CR2 handling got
> quite some updates/fixes since then.

Unfortunately we don't have ways to test that for now, but your comment
is quite interesting - we can take a look in the CR2 fixes since v4.4.

But what do you think about having a #PF while the instruction pointed
in the oops Code section (and the RIP address) is not a memory-related insn?

Thanks,


Guilherme
> 
> Thanks,
> 
> 	tglx
> 
> 


More information about the iommu mailing list