Prioritized experience replay buffer error.

Question

Prioritized experience replay buffer error.

nsidn98 opened this issue 4 years ago · 1 comments

I think this should be loss = (td_error.pow(2)*weights).mean().to(self.device) instead of loss = td_error.pow(2)*weights.mean().to(self.device). Without those brackets, loss is a vector of shape [batch_size, 1] instead of a scalar.

Command to reproduce the error:
python run_atari_dqn.py -env CartPole-v0 -agent dqn+per -frames 30000 -m 500000 --fill_buffer 50000 -eps_frames 1000 -seed 42 -info testCP

The error:

Traceback (most recent call last):
  File "run_atari_dqn.py", line 220, in <module>
    final_average100 = run(frames = args.frames//args.worker, eps_fixed=eps_fixed, eps_frames=args.eps_frames//args.worker, min_eps=args.min_eps, eval_every=args.eval_every//args.worker, eval_runs=args.eval_runs, worker=args.worker)
  File "run_atari_dqn.py", line 77, in run
    agent.step(s, a, r, ns, d, writer)
  File "/Users/user/Downloads/DQN-Atari-Agents/Agents/dqn_agent.py", line 122, in step
    loss = self.learn_per(experiences)
  File "/Users/user/Downloads/DQN-Atari-Agents/Agents/dqn_agent.py", line 239, in learn_per
    loss.backward()
  File "/Users/user/opt/miniconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/Users/user/opt/miniconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 121, in backward
    grad_tensors = _make_grads(tensors, grad_tensors)
  File "/Users/user/opt/miniconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 47, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

If you want I can submit a PR for that (although the change is a minor one:P).

Answer 1 · 2020-12-18T09:48:30.000Z

Hey, sorry for the late reply had some problems with github. and yes you are totally right! ill change it thanks a lot!