Improve test flakiness

Question

Improve test flakiness

Opened this issue 2 years ago · 1 comments

The tests are very comprehensive, and I’m very happy that they helped me find so many edge cases. They are written at a very high level, which gives a lot of confidence (and should make a potential rewrite in another language nice). However, the high level involves real time passing, real file system watching and real Web Sockets. While that did help me understand for example file watching better (like, how many watcher events do you get if the same file changes rapidly?), it does make the tests a bit flaky. I’ve used some pretty … clever … hacks to stabilize many tests, but not all.

Currently, the tests pass locally on Linux, Windows and macOS. In CI, jest.retryTimes(2) is needed but even with that one or two jobs usually fail and manually restarting them it’s possible to get all green checkmarks. So the tests still give a lot of confidence, they’re just a little bit annoying.

It doesn’t help that I got tired of testing and “fixed” some tests with arbitrary sleeps (in the tests, not in the source code). The readme says “elm-watch is serious about stability” – so this is a bit embarrassing.

As soon as I get some more energy I want to get back to this and clean the tests up.

Answer 1 · 2022-07-23T11:21:39.000Z

Here is a script to see which tests are retried the most:

set token FILL_ME_IN

set dir (status dirname)/scrape
mkdir -p $dir

set workflow_runs (curl -H "Accept: application/vnd.github+json" "https://api.github.com/repos/lydell/elm-watch/actions/runs?per_page=100&created=>=2022-07-16&exclude_pull_requests=true" | jq -c '.workflow_runs[]')

set count (count $workflow_runs)

for i in (seq $count)
    set workflow_run $workflow_runs[$i]
    set name (string join \n -- $workflow_run | jq -r '.name')
    if test $name != Test
        echo "### $i/$count: Skipping: $name"
        continue
    end
    set created_at (string join \n -- $workflow_run | jq -r '.created_at')
    set logs_url (string join \n -- $workflow_run | jq -r '.logs_url')
    set subdir $dir/$created_at
    set zip $subdir/logs.zip
    echo "### $i/$count: Download logs from $created_at to $subdir"
    rm -rf $subdir
    mkdir -p $subdir
    curl -L -H "Authorization: token $token" -H "Accept: application/vnd.github+json" $logs_url >$zip
    unzip -d $subdir $zip
end

set results_file $dir/results.tsv
rg 'RETRY ERRORS  (.+)' $dir -or '$1' | rg '([^/]+Z)[^(]+\(([^,]+), (\d+)\)[^:]+:(.+)' -or '$1'\t'$2'\t'$3'\t'$4' >$results_file
cut -f 4 $results_file | sort | uniq -c | sort -nr