xcube-dev/xcube

test_gen.py extremely slow on GitHub Actions

Closed this issue · 3 comments

Describe the bug

GitHub Actions unit test runs are very slow (usually around 40 minutes, versus ~12 minutes per platform on AppVeyor and ~5 minutes on my local machine). Closer inspection of the logs reveals that around 35 of these 40 minutes are spent in one test module -- test/core/gen/test_gen.py. For instance, look at the logs for this test run:

Mon, 11 Mar 2024 17:19:43 GMT test/core/byoa/test_fileset.py ...........                               [  9%]
Mon, 11 Mar 2024 17:19:43 GMT test/core/gen/test_config.py ......                                      [  9%]
Mon, 11 Mar 2024 17:54:01 GMT test/core/gen/test_gen.py ...............                                [ 10%]
Mon, 11 Mar 2024 17:54:01 GMT test/core/gen/test_iproc.py .......                                      [ 11%]
Mon, 11 Mar 2024 17:54:07 GMT test/core/gen2/local/test_generator.py ......                            [ 11%]
Mon, 11 Mar 2024 17:54:07 GMT test/core/gen2/local/test_helpers.py ..                                  [ 11%]

To Reproduce
Steps to reproduce the behavior:

  1. Run the xcube ‘Unittest and docker builds’ workflow in GitHub Actions, or look at one of the previous runs.
  2. Display the logs for the unittest step and activate timestamps.
  3. Observe that test_gen.py takes over half an hour.

Expected behavior
test_gen.py on GHA completes in seconds or tens of seconds (as it already does on other platforms), rather than in tens of minutes.

I profiled the module test/core/gen/test_gen.py and compared the results in CI pipeline and on the local machine.

github pipeline profiling:
Screenshot from 2024-05-15 08-29-53

local profiling:
Screenshot from 2024-05-15 08-51-18
Screenshot from 2024-05-15 08-51-43

What did I notice so far:

  • the fuction _compute_ij_images_for_source_line() in xcube/core/resampling/rectify.py consumes a lot of time which is not shown on the profiling result on the local machine for the first 1000 entries. Numba is used to parallelize the for loop. I assume that something goes wrong with numba in the CI pipeline.

In the next step I remove numba in the xcube/core/resampling/rectify.py

github pipeline profiling:
Screenshot from 2024-05-15 11-09-40

Local profiling
Screenshot from 2024-05-15 11-10-05
Screenshot from 2024-05-15 11-10-22

What did I notice:

  • The CI pipeline is not slower
  • The local testing got a lot slower and consumes now a lot of time in _compute_ij_images_for_source_line().

Conclusion
Something goes wrong with the parallelization using numba -> I will investigate that further

Cause
NUMBA_DISABLE_JIT is set to 1 in xcube_workflow.yaml#L17, which disables numba.jit.

Solution

  • set NUMBER_DISABLE_JIT: 0
  • remove env: NUMBA_DISABLE_JIT: 1