glDrawArrays crashes when start index is more than 65536
Closed this issue · 9 comments
I am not very sure about this. I find it kind of hard to debug.
But when testing my TilelessMap https://github.com/TilelessMap/TilelessMap on rpi3 it crashes sometimes.
What I have found it that it crashes when the start index of the geometry to draw in the vbo is larger then 2^16.
It handles the large array, that doesn't seem to be the problem. glDrawArrays executes every time until the index number becomes large, then it crashes on glDrawArrays. I haven't compiled mesa myself so all I get from Valgrind is that it is writing beyond buffer with 1 byte, happening multiple times.
It doesn't happen when I use glDrawElements for rendering polygons. Then it handles indexes of hundreds of thousands.
I haven't seen this issue on Linux or Android or Windows.
If this is not a user failure from my side, it seems like somewhere the index is truncated to short instead if int.
I read about the 2 byte limitation of index buffers https://github.com/anholt/mesa/wiki/VC4-OpenGL-support
but this is not about that. It is the start index of the geometry that glDrawArray uses
I'd be interested to see where the backtrace ends up (I assume by crash you mean segfault, not GPU hang).
We have to break large drawarrays into groups of <65535 verts, because internally the HW uses 16-bit indices to represent primitives after binning. It may be that some of my math there is broken.
Actually, I think I see something: in vc4_get_draw_cl_space() we don't account for the start vertex when setting up space for the shader recs. Also, we should be dividing by 65535-2 instead of 65535, I think.
Thanks for looking at it.
I will try to get a backtrace later today.
I have also made a quite minimal example showing the issue. I just have to confirm it actually crashes when I get to a rpi later today.
Yes, I get a "segmentation violation", or sometimes "memory corruption", which sounds similar to me.
Hi again
Sorry, but I am not able to get the backtrace before the weekend because I am away. I brought the rpi but don't have screen with hdmi to connect until I get home again.
But if you have a rpi and want to try I attach a simple case that should crash if my theory is correct.
I just modified this tutorial to build a large array: https://gitlab.com/wikibooks-opengl/modern-tutorials/tree/master/tut02_clean.
But i haven't tried it on the rpi. But it works as expected (not crashing) at my laptop with Debian.
Ok, the test posted above crashes in rpi as suspected.
Attached is output for gdb and valgrind from the crash. I guess the valgrind result is more usable.
The first error about
libGL error: MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to retrieve device information
appears also when reducing the array so the code is working. So I guess that is irrelevant for this issue.
gdb.txt
valgrind.txt
Sorry for the delay. I reproduced the problem and have a test fix at 1168d08 (the "vc4" branch) (passing my original test for the bug, and your test app). Could you give it a shot with your full app?
Updated the branch with a slight improvement.
I had som teroubles building Mesa, but now I can confirm that this issue seems to be solved.
I see new issues that I haven't seen on other platforms, but that is probably caused by issues in my code.
I just throw more and more data on the gpu and I have had the impression that it gets cleaned up automagically on other platforms, but on the rpi it starts hanging after some panning and zoomng in the map. I have to look through if I let the vbo's go as I should. If I suspect something is wrong I will open a new issue.
Thanks a lot for the fix!
From my point of view you can close this.
Will the fix be released and packed for the rpi anytime soon?
Landed in master:
commit 84ab48c15c9373dfa4709f4f9e887c329286e5a1
Author: Eric Anholt <eric@anholt.net>
Date: Fri Nov 24 21:40:50 2017 -0800
broadcom/vc4: Fix handling of GFXH-515 workaround with a start vertex count.
and I've nominated it for the stable branches.
Thanks!