intel/KVMGT-kernel

Master interrupt is disabled on host machine

Opened this issue · 10 comments

Hello.

If i run a program that uses vsync(vblank?) interrupt, sometimes the program is stalled and master interrupt of gpu device (DEIER) is disabled. For example,
when i run chrome browser (it seems chrome browser uses vsync because drm_handle_vblank function is called when chrome browser is running) on host machinee, after a while, chrome browser is stalled and DEIER's 31bit is changed to 0.

Why this problem is happend? Please let me know your opinion.

Thank you in advance.

This is a known issue on our side, I will push a fix asap. Thank you for reporting this.

Hi, just pushed a commit fixing this, please have a try, thanks!

Thank you I'll try it :)

The problem is still occured on guest, not host machine.
Maybe this problem is related to smp. Because there is no problem if i give only one vcpu to vm.
I guess something disables the master interrupt of guest's vreg by unknown reason.

Do you have any plan to fix it?

I don't understand why this happens for guest. Will you check the value of that vreg in /sys/kernel/debug/vgt/irqinfo?

1. qemu with smp 4

Interrupt control status:
vGT: DEISR is 0, DEIIR is 0, DEIMR is 4bffdef6, DEIER is b4002529
vGT: SDEISR is a80000, SDEIIR is 0, SDEIMR is f195ffff, SDEIER is ffffffff
vGT: GTISR is 0, GTIIR is 0, GTIMR is fffff7df, GTIER is 401821
vGT: PMISR is 0, PMIIR is 0, PMIMR is ffffff8f, PMIER is 470
vGT: RCS_IMR is fffff7df, VCS_IMR is ffffffff, BCS_IMR is ffffffff
Total 65931298 interrupts logged:
# WARNING: precisely this is the number of vGT
# physical interrupt handler be called,
# each calling several events can be
# been handled, so usually this number
# is less than the total events number.
62248923: Render Command Streamer MI USER INTERRUPT
11: Blitter Command Streamer MI USER INTERRUPT
13599: Pipe A vblank
24: Primary Plane A flip done
2: GSE
1841084: RP DOWN threshold interrupt
1832898: RP UP threshold interrupt
607: Gmbus
190: AUX Channel B
546143573588601: Last pirq
546143573604882: Last virq
5443: Average pirq cycles
792: Average virq cycles
85815: Average delay between pirq/virq handling

-->vgt-0:
....vreg (deier: b4002529, deiir: 0, deimr: 4bffdef7, deisr: 0)
....vreg (gtier: 401821, gtiir: 0, gtimr: fffff7df, gtisr: 0)
....vreg (sdeier: ffffffff, sdeiir: 0, sdeimr: f19dffff, sdeisr: 280000)
....vreg (pmier: 470, pmiir: 0, pmimr: ffffff8f, pmisr: 0)
....vreg (rcs_imr: fffff7df, vcs_imr: ffffffff, bcs_imr: ffffffff
546143527854915: Last injection
Total 13168536 virtual irq injection:
62103058: Render Command Streamer MI USER INTERRUPT
11: Blitter Command Streamer MI USER INTERRUPT
13599: Pipe A vblank
24: Primary Plane A flip done
2: GSE
1841084: RP DOWN threshold interrupt
1832898: RP UP threshold interrupt
607: Gmbus
190: AUX Channel B

-->vgt-1:
....vreg (deier: 0, deiir: 10000009, deimr: 4bffdef6, deisr: 0)
....vreg (gtier: 401821, gtiir: 400001, gtimr: fffff7df, gtisr: 0)
....vreg (sdeier: ffffffff, sdeiir: 2000000, sdeimr: f195ffff, sdeisr: 280000)
....vreg (pmier: 470, pmiir: 0, pmimr: ffffff8f, pmisr: 0)
....vreg (rcs_imr: fffff7df, vcs_imr: ffffffff, bcs_imr: ffffffff
544960653593286: Last injection
Total 4 virtual irq injection:
68: Render Command Streamer MI USER INTERRUPT
1: Blitter Command Streamer MI USER INTERRUPT
9466: Pipe A vblank
9: Primary Plane A flip done
1: DisplayPort/HDMI/DVI B Hotplug
152: AUX Channel B

2. qemu with one cpu

Interrupt control status:
vGT: DEISR is 0, DEIIR is 0, DEIMR is 4bffdef7, DEIER is b4002529
vGT: SDEISR is a80000, SDEIIR is 0, SDEIMR is f195ffff, SDEIER is ffffffff
vGT: GTISR is 0, GTIIR is 0, GTIMR is fffff7df, GTIER is 401821
vGT: PMISR is 0, PMIIR is 0, PMIMR is ffffff8f, PMIER is 470
vGT: RCS_IMR is fffff7df, VCS_IMR is ffffffff, BCS_IMR is ffffffff
Total 65935792 interrupts logged:
# WARNING: precisely this is the number of vGT
# physical interrupt handler be called,
# each calling several events can be
# been handled, so usually this number
# is less than the total events number.
62248945: Render Command Streamer MI USER INTERRUPT
13: Blitter Command Streamer MI USER INTERRUPT
17859: Pipe A vblank
27: Primary Plane A flip done
2: GSE
1841189: RP DOWN threshold interrupt
1832898: RP UP threshold interrupt
679: Gmbus
220: AUX Channel B
546627258970830: Last pirq
546627258989208: Last virq
5445: Average pirq cycles
792: Average virq cycles
85813: Average delay between pirq/virq handling

-->vgt-0:
....vreg (deier: b4002529, deiir: 0, deimr: 4bffdef7, deisr: 0)
....vreg (gtier: 401821, gtiir: 0, gtimr: fffff7df, gtisr: 0)
....vreg (sdeier: ffffffff, sdeiir: 0, sdeimr: f19dffff, sdeisr: 280000)
....vreg (pmier: 470, pmiir: 0, pmimr: ffffff8f, pmisr: 0)
....vreg (rcs_imr: fffff7df, vcs_imr: ffffffff, bcs_imr: ffffffff
546627258992478: Last injection
Total 13168767 virtual irq injection:
62103080: Render Command Streamer MI USER INTERRUPT
13: Blitter Command Streamer MI USER INTERRUPT
17859: Pipe A vblank
27: Primary Plane A flip done
2: GSE
1841189: RP DOWN threshold interrupt
1832898: RP UP threshold interrupt
679: Gmbus
220: AUX Channel B

-->vgt-1:
....vreg (deier: b4002529, deiir: 0, deimr: 4bffdef7, deisr: 0)
....vreg (gtier: 401821, gtiir: 0, gtimr: fffff7df, gtisr: 0)
....vreg (sdeier: ffffffff, sdeiir: 0, sdeimr: f195ffff, sdeisr: 280000)
....vreg (pmier: 470, pmiir: 0, pmimr: ffffff8f, pmisr: 0)
....vreg (rcs_imr: fffff7df, vcs_imr: ffffffff, bcs_imr: ffffffff
546444314478432: Last injection
Total 87 virtual irq injection:
20: Render Command Streamer MI USER INTERRUPT
2: Blitter Command Streamer MI USER INTERRUPT
5: Primary Plane A flip done
1: DisplayPort/HDMI/DVI B Hotplug
77: AUX Channel B

Interesting. The DEIER of guest is 0, and currently we disabled the force wake support in KVMGT!

There is probably some race conditions.

I will find some time to try reproduce this BUG and fix it, thanks for reporting this.

I'm curious. Does disabling forcewake is related to this bug?
And why did you disable forcewake in KVMGT?

Does disabling forcewake is related to this bug?

I guess not. There is possibly some race conditions.

And why did you disable forcewake in KVMGT?

It's not trivial to enable forcewake in KVMGT, so in the initial release it was simply disabled. But I will add that back in next release.

@terry84 : seems your problem is similar to #17
Can you have a try that simply enabling the master interrupt before returning from vgt_interrupt?