Tests fail when run with pytest --forked
segyges opened this issue · 1 comments
Describe the bug
When tests are run with pytest --forked per the instructions in /test/README.md, a large number of tests fail with the error:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
This appears to be a problem with the way tests are run in subprocesses. It makes testing, and therefore development on the library, rather difficult.
To Reproduce
Steps to reproduce the behavior:
- Install neox in your environment however you normally do
- Probably initialize a training run to make sure your environment is clean
- Exit out of that run
cd /tests
pytest --forked
Expected behavior
Tests pass, or fail for reasons to do with the code in the tests themselves.
Proposed solution
I have no idea.
Environment (please complete the following information):
- GPUs: 2x 3090s
- Configs: N/A
Additional context
forked-report.zip
Attached html report of the failures on the tests
Currently sidestepping this with #1149 until we have time to more properly resolve the issue with launching CUDA in forked processes.
Some tests are back, all are cleaned a bit, and model training tests are skipped for now.