pytorch/tnt

_validate_snapshot_available() failing although torchsnapshot is available

Opened this issue ยท 1 comments

๐Ÿ› Describe the bug

When running my code with torchtnt and the TorchSnapshotSaver (torchsnapshot_saver.py), I get the following error after construction of the class:

RuntimeError: TorchSnapshotSaver support requires torchsnapshot. Please make sure ``torchsnapshot`` is installed. Installation: https://github.com/pytorch/torchsnapshot#install

This line can be found here.
However, torchsnapshot can be imported.

Versions

I tried installing torchsnapshot and torchtnt from conda, pypi, and directly from the github repos. I always get this result.

I also ran into this.
It seems that torchsnapshot_saver.py is importing override_max_per_rank_io_concurrency from torchsnapshot.knobs, which is only available on the main branch and not in the 0.1.0 release.
Perhaps the simplest solution is to release another version of torchsnapshot, and constraint torchtnt to depend on that.

Edit: In the short term, installing torchsnapshot with pip install --pre torchsnapshot-nightly worked for me.