android-rpi/device_brcm_rpi3

libEGL.so GL API call to libGLES_mesa.so returns invalid value

peyo-hd opened this issue · 8 comments

Nowadays, trying to apply AOSP android-n-preview-3 branch on rpi3 build.
Build OK, but during boot-up Surfacefligner fails to initialize by libEGL.so got null on calling glGetString(GL_EXTENSIONS) to libGLES_mesa.so.

On the bottom, clipped logcat differences between Failed & OK cases.
And libEGL.so source where glGetString(GL_EXTENSIONS) returns null is here :
https://android.googlesource.com/platform/frameworks/native/+/android-n-preview-3/opengl/libs/EGL/egl_object.cpp#112

For comparison, tried using libGLES_mesa.so and accompanying binaries from npv3 build upon npv2 system image. And it's working fine.
But copying npv2 libGLES_mesa binaries into npv3 system image - shows same failure.
Then it seems the problem resides on libEGL side - it might fail to correctly calling GL API implementation in libGLES_mesa.

Suspected CLang update between AOSP npv2 & npv3, but reverting back CLang version on npv3 showed same problem.
Need further investigation from libEGL failure point.

Following link is difference between npv2 & npv3.
http://www.androidpolice.com/android_aosp_changelogs/android-n-preview-3-to-android-n-preview-4-AOSP-changelog.html

------------- Failure logcat on n-preview-3 --------------
SurfaceFlinger: Using composer version 1.0
SurfaceFlinger: EGL information:
SurfaceFlinger: vendor : Android
SurfaceFlinger: version : 1.4 Android META-EGL
SurfaceFlinger: extensions: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable
SurfaceFlinger: Client API: OpenGL_ES
SurfaceFlinger: EGLSurface: 8-8-8-8, config=0xb5a92f00
libEGL : call to OpenGL ES API with no current context (logged once per thread)
libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 209 (surfaceflinger)

DEBUG : #3 pc 000112c7 /system/lib/libEGL.so (android::egl_context_t::onMakeCurrent(void_, void_)+66)
DEBUG : #4 pc 00010ddb /system/lib/libEGL.so (android::egl_display_t::makeCurrent(android::egl_context_t_, android::egl_context_t_, void_, void_, void_, void_, void_, void_)+218)
DEBUG : #5 pc 00013845 /system/lib/libEGL.so (eglMakeCurrent+292)
DEBUG : #6 pc 00026d1d /system/lib/libsurfaceflinger.so (android::RenderEngine::create(void*, int)+244)
DEBUG : #7 pc 00018641 /system/lib/libsurfaceflinger.so (android::SurfaceFlinger::init()+316)
DEBUG : #8 pc 00000ee5 /system/bin/surfaceflinger (main+116)

------------- OK logcat on n-preview-2 --------------

SurfaceFlinger: Using composer version 1.0
SurfaceFlinger: EGL information:
SurfaceFlinger: vendor : Android
SurfaceFlinger: version : 1.4 Android META-EGL
SurfaceFlinger: extensions: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable
SurfaceFlinger: Client API: OpenGL_ES
SurfaceFlinger: EGLSurface: 8-8-8-8, config=0xb5a92f00
SurfaceFlinger: OpenGL ES informations:
SurfaceFlinger: vendor : Broadcom
SurfaceFlinger: renderer : Gallium 0.4 on VC4
SurfaceFlinger: version : OpenGL ES 2.0 Mesa 11.2.2 (git-095ca32)
SurfaceFlinger: extensions: GL_EXT_debug_marker GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_npot GL_OES_EGL_image GL_OES_depth_texture GL_OES_packed_depth_stencil GL_EXT_texture_type_2_10_10_10_REV GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_OES_surfaceless_context GL_EXT_separate_shader_objects GL_EXT_draw_elements_base_vertex GL_KHR_context_flush_control GL_OES_draw_elements_base_vertex
SurfaceFlinger: GL_MAX_TEXTURE_SIZE = 2048
SurfaceFlinger: GL_MAX_VIEWPORT_DIMS = 2048

Stumbled upon this today while moving on to android-n-preview-3. Interestingly, the RenderEngine can get the list of extensions via EGL call

https://android.googlesource.com/platform/frameworks/native/+/android-n-preview-3/services/surfaceflinger/RenderEngine/RenderEngine.cpp#141

or

https://android.googlesource.com/platform/frameworks/native/+/android-n-preview-3/services/surfaceflinger/RenderEngine/RenderEngine.cpp#432

Did you manage to boot preview-3 with preview-2 libraries?

The crash occurs on following point if we match this line of the stack trace - (android::RenderEngine::create(void*, int)+244).

https://android.googlesource.com/platform/frameworks/native/+/android-n-preview-3/services/surfaceflinger/RenderEngine/RenderEngine.cpp#108

EGL calls are working since it's done by android framkework's meta layer (1.4 Android META-EGL),
but calls to libGLES_mesa.so have failed.

Regarding npv2 & npv3 binary exchange test, I have updated the issue description.

Just for the sake of completeness: same behavior on n-preview-4, where libEGL.so gets an empty string with available extensions.

------------- Failure logcat on n-preview-4 --------------

SurfaceFlinger: Using composer version 1.0
SurfaceFlinger: EGL information:
SurfaceFlinger: vendor : Android
SurfaceFlinger: version : 1.4 Android META-EGL
SurfaceFlinger: extensions: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable
SurfaceFlinger: Client API: OpenGL_ES
SurfaceFlinger: EGLSurface: 8-8-8-8, config=0xb5a12f00
libEGL : call to OpenGL ES API with no current context (logged once per thread)
libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 124 (surfaceflinger)
: debuggerd: handling request: pid=124 uid=1000 gid=1003 tid=124
: debuggerd: Unable to connect to activity manager (connect failed: No such file or directory)
DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
DEBUG : Build fingerprint: 'Android/rpi3/rpi3:6.0.1/MASTER/kalkov06282306:userdebug/test-keys'
DEBUG : Revision: '0'
DEBUG : ABI: 'arm'
DEBUG : pid: 124, tid: 124, name: surfaceflinger >>> /system/bin/surfaceflinger <<<
DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
DEBUG : r0 00000000 r1 00000000 r2 0000000a r3 00000000
DEBUG : r4 ffffffff r5 00000000 r6 b6b6b3a0 r7 b6797c28
DEBUG : r8 b6797c00 r9 b67dd640 sl b6b6b378 fp b67dd640
DEBUG : ip b6acc8e8 sp beda08f0 lr b6a5f365 pc b6a5d4cc cpsr 600f0030
DEBUG :
DEBUG : backtrace:
DEBUG : #00 pc 000194cc /system/lib/libc.so (strlen+71)
DEBUG : #1 pc 0001b361 /system/lib/libc.so (strlen_chk+4)
DEBUG : #2 pc 0000c941 /system/lib/libutils.so (android::String8::setTo(char const
)+12)
DEBUG : #3 pc 000112c7 /system/lib/libEGL.so (android::egl_context_t::onMakeCurrent(void
, void_)+66)
DEBUG : #4 pc 00010ddb /system/lib/libEGL.so (android::egl_display_t::makeCurrent(android::egl_context_t_, android::egl_context_t_, void_, void_, void_, void_, void_, void_)+218)
DEBUG : #5 pc 00013845 /system/lib/libEGL.so (eglMakeCurrent+292)
DEBUG : #6 pc 00026cfd /system/lib/libsurfaceflinger.so (android::RenderEngine::create(void_, int)+244)
DEBUG : #7 pc 0001868d /system/lib/libsurfaceflinger.so (android::SurfaceFlinger::init()+316)
DEBUG : #8 pc 00000ee5 /system/bin/surfaceflinger (main+116)
DEBUG : #9 pc 0001711d /system/lib/libc.so (__libc_init+48)
DEBUG : #10 pc 00000d98 /system/bin/surfaceflinger (_start+96)

In the failure log, I was curious about following line just before exception.

libEGL : call to OpenGL ES API with no current context

So, set debug.egl.callstack=1 after finding out egl.cpp:gl_no_context() is printing the line.

https://android.googlesource.com/platform/frameworks/native/+/android-n-preview-3/opengl/libs/EGL/egl.cpp#214

Then got logs below, showing that the logcat line might be printed at the same point where glGetString(GL_EXTENSIONS) is called by egl_object.cpp:onMakeCurrent()

------------- Failure logcat with debug.egl.callstack == 1 --------------

SurfaceFlinger: Using composer version 1.0
SurfaceFlinger: EGL information:
SurfaceFlinger: vendor : Android
SurfaceFlinger: version : 1.4 Android META-EGL
SurfaceFlinger: extensions: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable
SurfaceFlinger: Client API: OpenGL_ES
SurfaceFlinger: EGLSurface: 8-8-8-8, config=0xb5a92f00

libEGL : call to OpenGL ES API with no current context (logged once per thread)
libEGL : #00 pc 00012be5 /system/lib/libEGL.so
libEGL : #1 pc 000112c1 /system/lib/libEGL.so
libEGL : #2 pc 00010ddb /system/lib/libEGL.so (android::egl_display_t::makeCurrent(android::egl_context_t_, android::egl_context_t_, void_, void_, void_, void_, void_, void_)+218)
libEGL : #3 pc 00013845 /system/lib/libEGL.so (eglMakeCurrent+292)
libEGL : #4 pc 00026cfd /system/lib/libsurfaceflinger.so
libEGL : #5 pc 0001868d /system/lib/libsurfaceflinger.so (android::SurfaceFlinger::init()+316)
libEGL : #6 pc 00000ee5 /system/bin/surfaceflinger
libEGL : #7 pc 0001711d /system/lib/libc.so (__libc_init+48)
libEGL : #8 pc 00000d98 /system/bin/surfaceflinger

libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 281 (surfaceflinger)
DEBUG : pid: 281, tid: 281, name: surfaceflinger >>> /system/bin/surfaceflinger <<<
DEBUG : #1 pc 0001b361 /system/lib/libc.so (strlen_chk+4)
DEBUG : #2 pc 0000c941 /system/lib/libutils.so (android::String8::setTo(char const
)+12)
DEBUG : #3 pc 000112c7 /system/lib/libEGL.so (android::egl_context_t::onMakeCurrent(void
, void_)+66)
DEBUG : #4 pc 00010ddb /system/lib/libEGL.so (android::egl_display_t::makeCurrent(android::egl_context_t_, android::egl_context_t_, void_, void_, void_, void_, void_, void_)+218)
DEBUG : #5 pc 00013845 /system/lib/libEGL.so (eglMakeCurrent+292)
DEBUG : #6 pc 00026cfd /system/lib/libsurfaceflinger.so (android::RenderEngine::create(void_, int)+244)
DEBUG : #7 pc 0001868d /system/lib/libsurfaceflinger.so (android::SurfaceFlinger::init()+316)
DEBUG : #8 pc 00000ee5 /system/bin/surfaceflinger (main+116)

I also tried to narrow it down, but w/o any success for now. Here are my debugging logs. Sorry for misusing the ticket for documentation, but I have to switch my workspace frequently, so it would be better to have all related information at one place:

Working Logcat:

SurfaceFlinger: Using composer version 1.0
SurfaceFlinger: EGL information:
SurfaceFlinger: vendor : Android
SurfaceFlinger: version : 1.4 Android META-EGL
SurfaceFlinger: extensions: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable

SurfaceFlinger: Client API: OpenGL_ES
SurfaceFlinger: EGLSurface: 8-8-8-8, config=0xb5f10f00

libEGL : >>>>> eglApi.cpp: eglMakeCurrent() [enter]
libEGL : >>>>> egl_tls.cpp: getContext() -> not initialized
libEGL : >>>>> egl_display.cpp: makeCurrent() [enter]
libEGL : >>>>> egl_display.cpp: makeCurrent() -> c != 0
libEGL : >>>>> egl_display.cpp: makeCurrent() -> result == true
libEGL : >>>>> egl_object.cpp: onMakeCurrent() [enter]
libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> empty extensions (version=1)
libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> setting new to 'GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_npot GL_OES_EGL_image GL_OES_depth_texture GL_OES_packed_depth_stencil GL_EXT_texture_type_2_10_10_10_REV GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_OES_surfaceless_context GL_EXT_separate_shader_objects GL_EXT_draw_elements_base_vertex GL_KHR_context_flush_control GL_OES_draw_elements_base_vertex '
libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> searching for marker
libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> tokenizing
libEGL : >>>>> egl_object.cpp: onMakeCurrent() [return]
libEGL : >>>>> egl_display.cpp: makeCurrent() [return]
libEGL : >>>>> eglApi.cpp:eglMakeCurrent() -> after makeCurrent
libEGL : >>>>> eglApi.cpp:eglMakeCurrent() -> result == true
libEGL : >>>>> eglApi.cpp:eglMakeCurrent() -> c != null
libEGL : >>>>> eglApi.cpp:eglMakeCurrent() [return]
libEGL : >>>>> egl.cpp: egl_get_string_for_current_context() [enter]

Failing Logcat (NP4):

SurfaceFlinger: Using composer version 1.0
SurfaceFlinger: EGL information:
SurfaceFlinger: vendor : Android
SurfaceFlinger: version : 1.4 Android META-EGL
SurfaceFlinger: extensions: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_surfaceless_context EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable
SurfaceFlinger: Client API: OpenGL_ES
SurfaceFlinger: EGLSurface: 8-8-8-8, config=0xb5a92f00

libEGL : >>>>> eglApi.cpp: eglMakeCurrent() [enter]
libEGL : >>>>> egl_tls.cpp: getContext() -> not initialized
libEGL : >>>>> egl_display.cpp: makeCurrent() [enter]
libEGL : >>>>> egl_display.cpp: makeCurrent() -> c != 0
libEGL : >>>>> egl_display.cpp: makeCurrent() -> result == true
libEGL : >>>>> egl_object.cpp: onMakeCurrent() [enter]
libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> empty extensions (version=1)
libEGL : >>>>> egl.cpp: egl_get_string_for_current_context() [enter]
libEGL : >>>>> egl_tls.cpp: getContext() -> not initialized
libEGL : call to OpenGL ES API with no current context (logged once per thread)
libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> setting new to '(null)'

libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 310 (surfaceflinger)
: debuggerd: handling request: pid=310 uid=1000 gid=1003 tid=310
: debuggerd: Unable to connect to activity manager (connect failed: No such file or directory)
DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
DEBUG : Build fingerprint: 'Android/rpi3/rpi3:6.0.1/MASTER/kalkov07021431:userdebug/test-keys'
DEBUG : Revision: '0'
DEBUG : ABI: 'arm'
DEBUG : pid: 310, tid: 310, name: surfaceflinger >>> /system/bin/surfaceflinger <<<
DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
DEBUG : r0 00000000 r1 00000000 r2 262d75eb r3 00000000
DEBUG : r4 ffffffff r5 00000000 r6 b6897c28 r7 00000000
DEBUG : r8 b68dd640 r9 b68dd640 sl b6897c00 fp b68f99e0
DEBUG : ip b6c178e8 sp be8f18e0 lr b6baa365 pc b6ba84cc cpsr 600e0030

In both cases, the implementation starts with an empty extension string:

libEGL : >>>>> egl_object.cpp: onMakeCurrent() -> empty extensions (version=1)

The failing version calls to egl_get_string_for_current_context() afterwards to retrieve the extensions string. In the working version there is no such call. The string is returned from somewhere else. Method egl_get_string_for_current_context() is only called at very end of the trace. This would mean that the call to glGetString() (in opengl/libs/EGL/egl_object.cpp:116) resolves to a different method. I'll look into it as soon as I find some more time.

Is there a way to explicitly specify the EGL/GLES version to be used? It looks like different versions are used in NP2 and NP4:

Working Logcat (NP2):

libEGL : egl_object.cpp: onMakeCurrent() [enter]
libEGL : egl_object.cpp: onMakeCurrent() -> gl.glGetString(GL_EXTENSIONS=7939)
libEGL : egl_object.cpp: onMakeCurrent() -> gl_extensions setTo GL_EXT_blend_ [...]
libEGL : egl_object.cpp: onMakeCurrent() -> find marker
libEGL : egl_object.cpp: onMakeCurrent() [return]
libGLESv1: gl.cpp: glGetString() [enter]
libGLESv1: gl.cpp: glGetString() -> ret == NULL
libGLESv1: gl.cpp: glGetString() [return]

Failing Logcat (NP4):

libEGL : egl_object.cpp: onMakeCurrent() [enter]
libEGL : egl_object.cpp: onMakeCurrent() -> gl.glGetString(GL_EXTENSIONS=7939)
libGLESv2: gl2.cpp: glGetString() [enter]
libGLESv2: gl2.cpp: glGetString() -> ret == NULL
libEGL : call to OpenGL ES API with no current context (logged once per thread)
libGLESv2: gl2.cpp: glGetString() [return]
libEGL : egl_object.cpp: onMakeCurrent() -> gl_extensions setTo (null)

I can't tell why, but here is the commit which breaks it:

https://android.googlesource.com/platform/hardware/libhardware_legacy/+/6ffcedb3cabb09bbc35fa3976db7169afe1491dc%5E%21/#F0

After removing libmedia dependency NP4 compiles and boots normally (something floods the logcat, but that's another issue). Is it possible that libmedia somehow changes the library initalization / loading order such that libGLESv2 is loaded first instead of libGLESv1?

I think removing the dependency is just a workaround. In my opinion libGLESv2 should be fixed to work properly anyways. Although I don't understand much about it.

Good finding Igor!

I think there might be symbol name collision between libmedia.so & libGLES_mesa.so.
I patched external/drm_gralloc like following link, and it works.

peyo-hd/external_drm_gralloc@8fa8086

At last, I can release n-review-4 patchset for rpi3.