intel/KVMGT-kernel

Question about gpu context switching error

Opened this issue · 1 comments

Hello.

I have a question. Could you tell me what below error message mean?
It seems this error is occured during gpu context switching between virtual machines with MI_SET_CONTEXT command.

[ 109.802566] [kvmgt] kvmgt_read_hva-33: copy_from_user failed: rc == 4, len == 4
[ 110.353158] vGT error:(ring_wait_for_completion:137) Timeout 500 ms for CMD comletion on ring 0
[ 110.362514] vGT error:(ring_wait_for_completion:138) expected(611), actual(610)
[ 110.370384] vGT error:(vgt_restore_hw_context:1551) change to VM context switch commands unfinished
[ 110.380117] vGT error:(vgt_do_render_context_switch:1714) Fail to restore context
[ 110.388163] vGT info:(dump_regs_on_err:1598) reg=0x2054, val=0xa
[ 110.394650] vGT info:(dump_regs_on_err:1598) reg=0x12054, val=0xa
[ 110.401212] vGT info:(dump_regs_on_err:1598) reg=0x22054, val=0xa
[ 110.407773] vGT info:(dump_regs_on_err:1598) reg=0x1a054, val=0xa
[ 110.414331] vGT info:(dump_regs_on_err:1598) reg=0xa098, val=0x3e80000
[ 110.421343] vGT info:(dump_regs_on_err:1598) reg=0xa09c, val=0x28001e
[ 110.428277] vGT info:(dump_regs_on_err:1598) reg=0xa0a8, val=0x1e848
[ 110.435140] vGT info:(dump_regs_on_err:1598) reg=0xa0ac, val=0x19
[ 110.441690] vGT info:(dump_regs_on_err:1598) reg=0xa0b4, val=0x3e8
[ 110.448333] vGT info:(dump_regs_on_err:1598) reg=0xa0b8, val=0xc350
[ 110.455073] vGT info:(dump_regs_on_err:1598) reg=0xa090, val=0x88040000
[ 110.462209] vGT info:(dump_regs_on_err:1598) reg=0xa094, val=0x0
[ 110.468693] vGT error:(vgt_do_render_context_switch:1779) Ring-0: (3359th checks 204th switch<1->0>)
[ 110.478533] vGT error:(vgt_do_render_context_switch:1780) FAIL on ring-0
[ 110.485755] vGT error:(vgt_do_render_context_switch:1785) cur(1): head(1c2d8), tail(1c2d8), start(7801000)
[ 110.496216] vGT error:(vgt_do_render_context_switch:1790) dom0(0): head(416660), tail(168f8), start(14b000)
[ 110.506668] VM0 :head(416660), tail(168f8), start(14b000), ctl(1f001), uhptr(0)
[ 110.514752] VM1():head(1c2d8), tail(1c2d8), start(7801000), ctl(1f001), uhptr(0)
[ 110.522907] debug registers,reg maked with <
> may not apply to every ring):
[ 110.530491] ....RING_EIR: 00000000
[ 110.534168] ....RING_EMR: ffffffff
[ 110.537826] ....RING_ESR: 00000000
[ 110.541500] ....00002068: 780c0000
[ 110.545167] ....INSTPS* (parser state): 00000500 :
[ 110.550315] ....ACTHD(active header): 000000b0
[00000090]: 00000262 00000000 00000000 00000000 00000000 04000000 0c000000 0008c10e
[000000b0]: 00000000(*) 04000001 6d800005 00000004 00000000 00000000 00000000 00000000
[000000d0]: 00000000 00000000 10400002 00000000 0f800000 00000263 00000000 00000000
[000000f0]: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[00000110]: 78100004 00000000 80010000 00000000 00000000 00000000 781b0005 00010023
[ 110.668942] vGT error:(vgt_thread:309) Hang in context switch, try to reset device.

Especially, I want to know about these error codes. I can't find any references about these registers.

[ 110.388163] vGT info:(dump_regs_on_err:1598) reg=0x2054, val=0xa
[ 110.394650] vGT info:(dump_regs_on_err:1598) reg=0x12054, val=0xa
[ 110.401212] vGT info:(dump_regs_on_err:1598) reg=0x22054, val=0xa
[ 110.407773] vGT info:(dump_regs_on_err:1598) reg=0x1a054, val=0xa
[ 110.414331] vGT info:(dump_regs_on_err:1598) reg=0xa098, val=0x3e80000
[ 110.421343] vGT info:(dump_regs_on_err:1598) reg=0xa09c, val=0x28001e
[ 110.428277] vGT info:(dump_regs_on_err:1598) reg=0xa0a8, val=0x1e848
[ 110.435140] vGT info:(dump_regs_on_err:1598) reg=0xa0ac, val=0x19
[ 110.441690] vGT info:(dump_regs_on_err:1598) reg=0xa0b4, val=0x3e8
[ 110.448333] vGT info:(dump_regs_on_err:1598) reg=0xa0b8, val=0xc350
[ 110.455073] vGT info:(dump_regs_on_err:1598) reg=0xa090, val=0x88040000
[ 110.462209] vGT info:(dump_regs_on_err:1598) reg=0xa094, val=0x0

Thank you in advance :)

Did you meet this every time, or just once?

To me, it's looks suspicious that, the user address access failed. This was possibly caused by swapping, would you please add "-realtime mlock=on" to qemu command and have a try?

I can't find any references about these registers

Are you referring to the opensource PRM of GEN?