[BUG]Tensordict-0.2.1 breaks MarioRL tutorial

Question

[BUG]Tensordict-0.2.1 breaks MarioRL tutorial

malfet opened this issue a year ago · 3 comments

Describe the bug

Attempts to run https://github.com/pytorch/tutorials/blob/main/intermediate_source/mario_rl_tutorial.py with tensordict-0.2.1 quickly results in

Traceback (most recent call last):
  File "/home/nshulga/git/pytorch/tutorials/intermediate_source/mario_rl_tutorial.py", line 757, in <module>
    mario.cache(state, next_state, action, reward, done)
  File "/home/nshulga/git/pytorch/tutorials/intermediate_source/mario_rl_tutorial.py", line 379, in cache
    self.memory.add(TensorDict({"state": state, "next_state": next_state, "action": action, "reward": reward, "done": done}, batch_size=[]))
  File "/home/nshulga/miniconda3/envs/py310-torch210/lib/python3.10/site-packages/torchrl/data/replay_buffers/replay_buffers.py", line 720, in add
    index = super()._add(data_add)
  File "/home/nshulga/miniconda3/envs/py310-torch210/lib/python3.10/site-packages/torchrl/data/replay_buffers/replay_buffers.py", line 259, in _add
    index = self._writer.add(data)
  File "/home/nshulga/miniconda3/envs/py310-torch210/lib/python3.10/site-packages/torchrl/data/replay_buffers/writers.py", line 82, in add
    self._storage[self._cursor] = data
  File "/home/nshulga/miniconda3/envs/py310-torch210/lib/python3.10/site-packages/torchrl/data/replay_buffers/storages.py", line 72, in __setitem__
    ret = self.set(index, value)
  File "/home/nshulga/miniconda3/envs/py310-torch210/lib/python3.10/site-packages/torchrl/data/replay_buffers/storages.py", line 321, in set
    self._init(data)
  File "/home/nshulga/miniconda3/envs/py310-torch210/lib/python3.10/site-packages/torchrl/data/replay_buffers/storages.py", line 611, in _init
    filesize = os.path.getsize(tensor.filename) / 1024 / 1024
AttributeError: 'Tensor' object has no attribute 'filename'

To Reproduce

Run https://github.com/pytorch/tutorials/blob/main/intermediate_source/mario_rl_tutorial.py with tensordict-0.2.1 and it will fail to complete. Roll back to tensordict==0.2.0 and everythign will work fine

Expected behavior

It should work

Screenshots

If applicable, add screenshots to help explain your problem.

System info

Describe the characteristic of your environment:

pip
Python-3.10
torch-2.1.0, numpy-1.26

import tensordict, numpy, sys, torch
print(tensordict.__version__, numpy.__version__, sys.version, sys.platform, torch.__version__)

output:

0.2.1 1.26.0 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] linux 2.1.0+cu121

Additional context

See Tutorials CI for example: https://github.com/pytorch/tutorials/actions/runs/6659680370/job/18099294856

Reason and Possible fixes

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

Answer 1 · 2023-10-28T08:47:17.000Z

We had an issue in torchrl with the M1 wheels as well as a couple of environment / transforms related stuff (which cause this issue) and bumped a 0.2.1 version for rl and tensordict. I see we pinned rl (and now trnsordict) to 0.2.0 in the tutorials. We stick to that until torch 2.1.1 (when we'll bump a 0.2.2 for both) or pin the new versions in the tutorials (0.2.1). Sorry it isn't super clean, will do better in the future

Answer 2 · 2023-10-29T16:54:19.000Z

@vmoens generic question: does either tensordict or rl depend on the particular version for torch? If that is the case, then I wonder if it would be better to specify this dependency more explicitly in wheel file? Because right now both tensordict=0.2.0 and tensordict=0.2.1 can be installed against torch-2.1.0 right now, but looks like this is not a compatibility matrix you want to support

Re: I think pinning is a very sound strategy for CI (in PyTorch CI all dependencies are pinned by default, but there are bots that issue update PRs on a regular basis.)

Answer 3 · 2023-10-30T08:02:14.000Z

Yes I agree, the dependency should have been more clearly stated!