bagit-gython is not thread safe
djjuhasz opened this issue · 1 comments
djjuhasz commented
When (*bagit.Bag).Validate()
is run in two separate threads at the same time (e.g. via parallel tests) validation fails with a fatal Python error.
Test file
func TestValidateBag(t *testing.T) {
t.Parallel()
t.Run("Fails validation", func(t *testing.T) {
t.Parallel()
b, err := bagit.NewBagIt()
assert.NilError(t, err)
err = b.Validate("/tmp/691b8e7f-e6b7-41dd-bc47-868e2ff69333")
assert.Error(t, err, "invalid: Expected bagit.txt does not exist: /tmp/691b8e7f-e6b7-41dd-bc47-868e2ff69333/bagit.txt")
err = b.Cleanup()
assert.NilError(t, err)
})
t.Run("Validates bag", func(t *testing.T) {
t.Parallel()
b, err := bagit.NewBagIt()
assert.NilError(t, err)
err = b.Validate("internal/testdata/valid-bag")
assert.NilError(t, err)
err = b.Cleanup()
assert.NilError(t, err)
})
}
Results
❯ go test ./...
? github.com/artefactual-labs/bagit-gython/internal/dist [no test files]
? github.com/artefactual-labs/bagit-gython/internal/dist/data [no test files]
? github.com/artefactual-labs/bagit-gython/internal/dist/generate [no test files]
? github.com/artefactual-labs/bagit-gython/internal/runner [no test files]
Unable to calculate file hashes for /home/djuhasz/Artefactual/artefactual-labs/bagit-gython/internal/testdata/valid-bag
Traceback (most recent call last):
File "/tmp/python/bagit-bagit-lib-49db2197764d6aa1/bagit.py", line 890, in _validate_entries
pool = multiprocessing.Pool(
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/python/bagit-038d476ec7f987d3/lib/python3.12/multiprocessing/context.py", line 118, in Pool
from .pool import Pool
File "/tmp/python/bagit-038d476ec7f987d3/lib/python3.12/multiprocessing/pool.py", line 19, in <module>
import queue
ModuleNotFoundError: No module named 'queue'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/python/bagit-bagit-lib-49db2197764d6aa1/bagit.py", line 895, in _validate_entries
pool.terminate()
^^^^
UnboundLocalError: cannot access local variable 'pool' where it is not associated with a value
--- FAIL: TestValidateBag (0.00s)
--- FAIL: TestValidateBag/Validates_bag (1.75s)
bagit_test.go:34: assertion failed: error is not nil: invalid: cannot access local variable 'pool' where it is not associated with a value
FAIL
FAIL github.com/artefactual-labs/bagit-gython 1.760s
FAIL
Additional information
If I remove one or more of the t.Parallel()
calls from the test file, then the tests pass without error
djjuhasz commented
I also tried treating bagit.NewBagIt()
as a singleton, by moving the constructor and cleanup to the main test function:
func TestValidateBag(t *testing.T) {
t.Parallel()
b, err := bagit.NewBagIt()
assert.NilError(t, err)
t.Cleanup(func() {
err = b.Cleanup()
assert.NilError(t, err)
})
t.Run("Fails validation", func(t *testing.T) {
t.Parallel()
err = b.Validate("/tmp/691b8e7f-e6b7-41dd-bc47-868e2ff69333")
assert.Error(t, err, "invalid: Expected bagit.txt does not exist: /tmp/691b8e7f-e6b7-41dd-bc47-868e2ff69333/bagit.txt")
})
t.Run("Validates bag", func(t *testing.T) {
t.Parallel()
err = b.Validate("internal/testdata/valid-bag")
assert.NilError(t, err)
})
}
The tests passed in this case:
❯ go test ./...
? github.com/artefactual-labs/bagit-gython/internal/dist [no test files]
? github.com/artefactual-labs/bagit-gython/internal/dist/data [no test files]
? github.com/artefactual-labs/bagit-gython/internal/dist/generate [no test files]
? github.com/artefactual-labs/bagit-gython/internal/runner [no test files]
ok github.com/artefactual-labs/bagit-gython 1.819s