adamwulf/JotUI

Use glFence and TextureCaches in OpenGL ES 2.0

adamwulf opened this issue · 0 comments

There is currently some slowdown particularly when using glReadPixels with a glFlush or glFinish afterwards. The flush is required to ensure that the glReadPixels call is complete, but it also stalls the app until all of the openGL commands have completed. Instead, I should use glFence so that only that single thread can wait on the glReadPixels call while other GL calls can continue to be fed into the system.

Below are my notes and research on the topic:

I think the issue to resolve it is to:

  1. make glReadPixels asynchronous
  2. accomplish (1) by using Pixel Buffer Objects (PBO)

this way, the glReadPixels won't lock the CPU while waiting on the GPU to copy the pixel data, though it'll change how and probably when i can process that data

from: https://developer.apple.com/library/mac/technotes/tn2093/_index.html#//apple_ref/doc/uid/DTS10003289-CH1-TNTAG9

On the contrary, glReadPixels() with PBOs (pixel buffer objects) can schedule asynchronous data transfer and returns immediately without stall. Therefore, the application can execute other processes right away while transferring the data by OpenGL at the same time. The other advantage of using PBOs is the fast pixel data transfer from (and to) a graphics card though DMA without involving CPU cycles. In the conventional way, the pixel data is loaded into system memory by CPU. Using a PBO, instead, GPU manages copying data from the frame buffer to a PBO. This means that OpenGL performs a DMA transfer operation without wasting CPU cycles.

To maximize asynchronous read back performance, you can use two PBOs. Every frame, the application reads the pixel data from the framebuffer to one PBO using glReadPixels(), and processes the pixel data in the other. Calls to glMapBufferARB() and glUnmapBufferARB() will map/unmap the OpenGL controlled buffer object to the client's address space so that you can access and modify the buffer through a pointer. These read and process can be performed simultaneously, because glReadPixels() to the first PBO returns immediately so CPU can start processing data in the second PBO without delay. You alternate between the two PBOs every frame.

I had another theory about what to do. When implementing the texture cache code, i noticed that it still used a call to glFinish(). that call seems to sync the GPU/CPU and it may take multiple seconds if the GPU has tons to do on other contexts / thread as well. My problem doesn't seem to be the time it takes to pull the texture data out, it seems to be the time it takes to sync the GPU/CPU before the read even starts.

I did some more reading tonight, and it seems that if I use glReadPixels into a pixel buffer object, then the glReadPixels won't trigger a GPU/CPU sync, and I can remove my glFinish call, and it will move the pixels into the PBO asynchronously. Importantly, I can use a Fence object after the glReadPixels call. That fence will signal after the glReadPixels call has finished on the GPU, which means the texture data will be ready and sitting in the PBO.

Then I can do the texture read 100% asynchronously, and use the testFence feature to determine if the fence has been signaled without blocking the CPU at all. That'd let me be able to keep the UI responsive while waiting for the texture to be read from a cloned scrap - I could even block its interactions and show a dimmed spinner etc until the clone has completed. That'd be a much better to show a spinner + responsive UI for 5 seconds instead of completely locked UI for 5 seconds.

links with info about fences:

http://stackoverflow.com/questions/15137020/using-fence-sync-objects-in-opengl

https://developer.apple.com/library/mac/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_designstrategies/opengl_designstrategies.html#//apple_ref/doc/uid/TP40001987-CH2-SW9

https://www.opengl.org/registry/specs/APPLE/fence.txt

This page has a fence example at the bottom of the page:

https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html

More good discussion here as well:
https://www.opengl.org/discussion_boards/showthread.php/171319-glFlush-or-glFinish-with-mulithreading