[Bug]: race condition with cleanup and parallel
Closed this issue · 6 comments
Description
It seems there's a race condition with cleaning up snapshots when using parallel tests that means if a snapshot changes there's a chance any of the snapshots in the same file could be duplicates, moved around, etc.
Steps to Reproduce
package main
import (
"fmt"
"math/rand"
"os"
"strings"
"testing"
"time"
"github.com/gkampitakis/go-snaps/snaps"
)
var runCount = 2
var testsCount = 10
func init() {
rand.Seed(time.Now().UnixNano())
}
var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
func randSeq(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letters[rand.Intn(len(letters))]
}
return string(b)
}
func TestMain(m *testing.M) {
for i := 0; i < runCount; i++ {
code := m.Run()
snaps.Clean(m, snaps.CleanOpts{Sort: true})
b, err := os.ReadFile("__snapshots__/main_test.snap")
if err != nil {
if i == 0 {
// ignore on the first run, since it should generate the snapshot
continue
}
panic(err)
}
content := string(b)
for i := 0; i < testsCount; i++ {
header := fmt.Sprintf("[TestSnaps/#%02d - 1]", i)
count := strings.Count(content, header)
if count > 1 {
code = 1
fmt.Printf("Saw '%s' %d times\n", header, count)
}
}
if code != 0 {
os.Exit(code)
}
}
os.Exit(0)
}
func TestSnaps(t *testing.T) {
t.Parallel()
type args struct {
str string
}
type testCase struct {
name string
args args
}
tests := make([]testCase, 0, testsCount)
for i := 0; i < testsCount; i++ {
tests = append(tests, testCase{args: args{str: randSeq(10)}})
}
for _, tt := range tests {
tt := tt
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
snaps.MatchSnapshot(t, tt.args.str)
})
}
}
Run with UPDATE_SNAPS=true go test ./...
:
❯ UPDATE_SNAPS=true go test ./...
PASS
Snapshot Summary
✎ 10 snapshots added
PASS
Snapshot Summary
✎ 12 snapshots added
✎ 8 snapshots updated
Saw '[TestSnaps/#05 - 1]' 2 times
Saw '[TestSnaps/#08 - 1]' 2 times
FAIL github.com/g-rath/go-snaps-testing 0.003s
FAIL
Running it multiple times will compound the result:
❯ UPDATE_SNAPS=true go test ./...
PASS
Snapshot Summary
✎ 7 snapshots added
✎ 3 snapshots updated
Saw '[TestSnaps/#01 - 1]' 3 times
Saw '[TestSnaps/#03 - 1]' 3 times
Saw '[TestSnaps/#04 - 1]' 2 times
Saw '[TestSnaps/#05 - 1]' 2 times
Saw '[TestSnaps/#06 - 1]' 2 times
Saw '[TestSnaps/#07 - 1]' 3 times
Saw '[TestSnaps/#08 - 1]' 4 times
Saw '[TestSnaps/#09 - 1]' 2 times
FAIL github.com/g-rath/go-snaps-testing 0.004s
FAIL
Note that you can reproduce this without the looping, it just makes it a lot easier - but importantly my TestMain
here is not interfering with the cleanup function.
Commenting out the t.Parallel
calls resolves the issue
Expected Behavior
The snapshot order is stable, and snapshots are not duplicated.
it seems like this also might impact the general writing/updating of snapshots - I've been finding while working on google/osv-scanner#889 that I have to run the suite twice to ensure all snapshots are updated
Hey 👋 sorry I have been off sick. But will have a look asap when I am back. Thanks for the investigation and creating this issue.
Hey no worries - you focus on getting better :)
For now I've written a python script that ill post shortly to do the sorting and cleanup which seems to be working well
Hey I am partially back 😄 . I have rewritten this message three times thinking i knew what the problem is 😅 This time I think I know. I think the problem lies at https://github.com/gkampitakis/go-snaps/blob/main/snaps/snapshot.go#L131 not being protected with a lock. Will try to verify my theory the coming days and create a pr if you want to test it with. Needs some thinking the way I am handling locking there.
Again thank you for opening this issue and sorry for the inconvenience.
awesome stuff! fwiw this is the Python script I wrote:
#!/usr/bin/env python
import glob
import re
def deduplicate_snapshots(snapshots):
by_name = {}
for snapshot in snapshots:
if snapshot[1] in by_name:
are_identical = len(snapshot) == len(by_name[snapshot[1]]) and [
i for i, j in zip(snapshot, by_name[snapshot[1]]) if i == j
]
print(
" removed duplicate of",
snapshot[1].strip(),
"(were identical)" if are_identical else "",
)
continue
by_name[snapshot[1]] = snapshot
return by_name.values()
def sort_snapshots(snapshots):
return sorted(
snapshots,
key=lambda lines: [
int(x) if x.isdigit() else x for x in re.split(r"(\d+)", lines[1])
],
)
def sort_snapshots_in_file(filename):
snapshots = []
snapshot = []
with open(filename, "r") as f:
for line in f.readlines():
if line == "---\n": # marks the start of a new snapshot
snapshots.append(snapshot)
snapshot = []
else:
snapshot.append(line)
print("found", len(snapshots), "snapshots in", filename)
snapshots = deduplicate_snapshots(snapshots)
snapshots = sort_snapshots(snapshots)
if filename.endswith("v_test.snap") or filename.endswith("v_test2.snap"):
return
with open(filename, "w") as f:
for snapshot in snapshots:
f.writelines(snapshot)
f.writelines("---\n")
for snapshot_file in glob.iglob("**/__snapshots__/*.snap", recursive=True):
sort_snapshots_in_file(snapshot_file)
(I want to share it somewhere since it won't even need to go into a PR now 😅)
Sorry for the need of the script. Hope the change solves this problem 😄 Feel free to reopen the issue if you still notice an error.