Prioritized experience replay buffer error.
nsidn98 opened this issue · 1 comments
nsidn98 commented
I think this should be loss = (td_error.pow(2)*weights).mean().to(self.device)
instead of loss = td_error.pow(2)*weights.mean().to(self.device)
. Without those brackets, loss is a vector of shape [batch_size, 1] instead of a scalar.
Command to reproduce the error:
python run_atari_dqn.py -env CartPole-v0 -agent dqn+per -frames 30000 -m 500000 --fill_buffer 50000 -eps_frames 1000 -seed 42 -info testCP
The error:
Traceback (most recent call last):
File "run_atari_dqn.py", line 220, in <module>
final_average100 = run(frames = args.frames//args.worker, eps_fixed=eps_fixed, eps_frames=args.eps_frames//args.worker, min_eps=args.min_eps, eval_every=args.eval_every//args.worker, eval_runs=args.eval_runs, worker=args.worker)
File "run_atari_dqn.py", line 77, in run
agent.step(s, a, r, ns, d, writer)
File "/Users/user/Downloads/DQN-Atari-Agents/Agents/dqn_agent.py", line 122, in step
loss = self.learn_per(experiences)
File "/Users/user/Downloads/DQN-Atari-Agents/Agents/dqn_agent.py", line 239, in learn_per
loss.backward()
File "/Users/user/opt/miniconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/Users/user/opt/miniconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 121, in backward
grad_tensors = _make_grads(tensors, grad_tensors)
File "/Users/user/opt/miniconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 47, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
If you want I can submit a PR for that (although the change is a minor one:P).
BY571 commented
Hey, sorry for the late reply had some problems with github. and yes you are totally right! ill change it thanks a lot!