emscripten-core/emscripten

other.test_gen_struct_info is flaky

Opened this issue · 7 comments

juj commented

Occassionally fails with

gen_struct_info: Calling generated program... /tmp/tmp6t3jvaci.js
code.c: no_exit=1 assertions=1 flush=0 keepalive=0 filesystem=0
Traceback (most recent call last):
  File "/home/clb/buildbot/h12dsi-linux-mint22/emscripten_linux_x64/build/emscripten/main/tools/gen_struct_info.py", line 411, in <module>
    sys.exit(main(sys.argv[1:]))
             ~~~~^^^^^^^^^^^^^^
  File "/home/clb/buildbot/h12dsi-linux-mint22/emscripten_linux_x64/build/emscripten/main/tools/gen_struct_info.py", line 394, in main
    info_fragment = inspect_code(header_files, use_cflags)
  File "/home/clb/buildbot/h12dsi-linux-mint22/emscripten_linux_x64/build/emscripten/main/tools/gen_struct_info.py", line 290, in inspect_code
    info = inspect_headers(headers, cflags)
  File "/home/clb/buildbot/h12dsi-linux-mint22/emscripten_linux_x64/build/emscripten/main/tools/gen_struct_info.py", line 273, in inspect_headers
    return json.loads(info)
           ~~~~~~~~~~^^^^^^
  File "/home/clb/.pyenv/versions/3.13.3/lib/python3.13/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/home/clb/.pyenv/versions/3.13.3/lib/python3.13/json/decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/clb/.pyenv/versions/3.13.3/lib/python3.13/json/decoder.py", line 361, in raw_decode
    obj, end = self.scan_once(s, idx)
               ~~~~~~~~~~~~~~^^^^^^^^
json.decoder.JSONDecodeError: Illegal trailing comma before end of object: line 692 column 13 (char 10583)
None
None
[8%] test_gen_struct_info (test_other.other.test_gen_struct_info) ... FAIL

I don't think I've ever seen this one before! Any idea how this could possibly flake? Seems like pretty straight forward non-threaded code.

juj commented

My guess is by interaction from other tests in the parallel run. I haven't been able to reproduce this on the bot where it fails, at least by running the test itself multiple times on repeat.

juj commented

http://clbri.com:8010/#/builders/10 has multiple runs where it fails.

Some of the failures have different quite nondescript failure logs: http://clbri.com:8010/api/v2/logs/54169/raw_inline

I wonder why we haven't see this on our CI at all. Are you doing anyting other than running the test_other.py with normal/default level of parallelism?

juj commented

Nothing special. You can see what's being run in the log.

E.g. it is running

source ./emsdk_env.sh; cd emscripten/main; python3  test/runner.py --failing-and-slow-first --failfast other \

skip:other.test_dlmalloc \
skip:other.test_bullet_cmake \
skip:other.test_dylink_zlib_reversed \
skip:other.test_dylink_zlib \
skip:other.test_zlib_configure \
skip:other.test_legacy_exported_runtime_numbers \
skip:other.test_sse2 skip:other.test_modularize_closure_pre \
skip:other.test_sse2_nontrapping \
skip:other.test_sse4_1 \
skip:other.test_iostream_and_determinism \
skip:other.test_openjpeg \
skip:other.test_zlib_cmake \
skip:other.test_avx_nontrapping \
skip:other.test_avx skip:other.test_freetype \
skip:other.test_bullet_autoconf \
skip:other.test_avx2 \
skip:other.test_avx2_nontrapping \
skip:other.test_printf_wasmfs \
skip:other.test_printf \
skip:other.test_poppler \
skip:other.test_cmake_compile_features \
skip:other.test_cmake_compile_features_noforce \
skip:other.test_safe_stack \
skip:other.test_wasm_sourcemap_relative_paths \
skip:other.test_codesize_cxx_mangle \
skip:other.test_codesize_hello_dylink_all \
skip:other.test_minimal_runtime_code_size_hello_webgl2_wasm \
skip:other.test_minimal_runtime_code_size_hello_webgl2_wasm2js \
skip:other.test_minimal_runtime_code_size_hello_webgl_wasm \
skip:other.test_minimal_runtime_code_size_hello_webgl_wasm2js

with the environment

environment:
  EMCC_SKIP_SANITY_CHECK=1
  EMTEST_BENCHMARKERS=clang,size,node,node-64
  EMTEST_BROWSER=/Applications/Firefox.app/Contents/MacOS/firefox
  EMTEST_RETRY_FLAKY=5
  EMTEST_SKIP_CCACHE=1
  EMTEST_SKIP_EH=1
  EMTEST_SKIP_JSPI=1
  EMTEST_SKIP_NINJA=1
  EMTEST_SKIP_NODE_CANARY=1
  EMTEST_SKIP_NODE_DEV_PACKAGES=1
  EMTEST_SKIP_RUST=1
  EMTEST_SKIP_SCONS=1
  EMTEST_SKIP_V8=1
  EMTEST_SKIP_WASM64=1
  EMTEST_SKIP_WASM_ENGINE=1
juj commented

(the skips are to skip over all the slow tests for iteration)

juj commented

Python is 3.13.3 and Node.js is 22.16.0, as produced by emsdk install step: http://clbri.com:8010/api/v2/logs/54178/raw_inline

[b'./emsdk', b'install', b'sdk-main-64bit', b'node-nightly-64bit', b'ninja-git-release-64bit']