E3SM-Project/e3sm_diags

[DevOps]: Speed up integration tests on GitHub Actions by caching test data downloads

Opened this issue · 3 comments

Overview

We run the CI/CD workflow for pull requests and pushes to main, which takes around 11-13 minutes.

The testing step takes up ~75% of the build time. Most of this time is from downloading the test data from LCRC.

The Problem

This slow build time has a hidden cost that impacts overall productivity:

...developers could either be waiting the entire time a build runs or end up context-switching to work on something else while a build runs. Both of these impact overall productivity (more on this below).
-- https://github.blog/2022-12-08-experiment-the-hidden-costs-of-waiting-on-slow-build-times/

Possible Solutions

Look into caching the test download data. We need to develop a scheme for refreshing the cached download data too.

https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows

+1 on the productivity, and thanks for including the links 😃

+1 on the productivity, and thanks for including the links 😃

Of course @mahf708! I'm always onboard for efficiency and improvements.

In PR #747, I cleaned up integration tests and removed redundant ones. This cut the total build time to ~6 minutes. Here's a build run showing these improvements: https://github.com/E3SM-Project/e3sm_diags/actions/runs/7199737714

The remaining integration test performs an image diff check. It executes a diagnostic run using default sets and a list of parameters. This test diagnostic run is pretty heavy, which is why it takes ~3 minutes.

It actually only takes ~30 secs to download the integration test data and images. It is still a good idea to cache these resources because they don't change often.