quantumiracle/Popular-RL-Algorithms

ValueError on SAC v2 LSTM

sarmientoj24 opened this issue · 3 comments

I am getting these errors. Any idea why?

    done = T.tensor(np.float32(done)).unsqueeze(-1).to(self.policy_net.device)
ValueError: setting an array element with a sequence.
    reward = T.tensor(reward, dtype=T.float).unsqueeze(-1).to(self.policy_net.device)
ValueError: expected sequence of length 23 at dim 1 (got 48)

Hi,
Please provide the reference to the code lines or the whole program you run. It seems like some trivial type and shape errors.

Hi,
In the sac_v2_lstm.py , when i change environment to "BipedalWalker-v3" it gives ValueError that @sarmientoj24 got. i did not see any error on code because i changed only environment but gives error. Do you have any idea?

Hi,
I realized the problem is caused by the difference of episode lengths. The LSTM version algorithms I implemented require the episode length to be the same for all episodes, and the sample is saved in an episodic manner (here). This is okay for the environments I tested (e.g. 'Reacher', 'Pendulum-v0', 'HalfCheetah-v2') without failure states, but for 'BipedalWalker', there is the failure state (when the bot falls onto the ground) that will terminate the episode, which makes episode length different for each episode. Thus the sampled data here will not be a valid tensor thus this line will return such error.

See some previous discussions and solutions #20. But this is not implemented in the current repo, it should be not hard to do that.