BY571/Soft-Actor-Critic-and-Extensions

UserWarning

Closed this issue · 6 comments

Hi, when I run your code, the using warning is:

UserWarning: Using a target size (torch.Size([256, 3])) that is different to the input size (torch.Size([256, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  critic2_loss = 0.5*F.mse_loss(Q_2, Q_targets.detach())

My python is 3.6 and pytorch is 1.4.0 with numpy 1.19.2.
The env is hopper-v2.

BY571 commented

Hey @ChenyangRan sry for the late reply im having some problems with github recently. Can you tell me what script you were running?

Hey @ChenyangRan sry for the late reply im having some problems with github recently. Can you tell me what script you were running?

Hi, it happened when I test the Hopper-V2 and type the flowing command.

python SAC.py -env Hopper-v2 -ep 200 -info sac
BY571 commented

@ChenyangRan the SAC.py file is kind of old and outdated. Could you try to use the run.py file? If it does not work please let me know :)

@ChenyangRan the SAC.py file is kind of old and outdated. Could you try to use the run.py file? If it does not work please let me know :)

I'm sorry that I haven't read the README carefully.
I have used the run.py witthe command:

python run.py -env Hopper-v2 -seed 0 -info sac_test -w 2

Is it the original SAC agent? And it works without using warning.
Also, I noticed that the run is with the frames instad of n_epoches, which means the code records the total steps?

BY571 commented

@ChenyangRan Yes you are right it records the total interactions with the environment.

If you don't use any of the extensions it's plane SAC. If you set a flag like -per 1 it used prioritized experience replay. Setting -munchausen 1 it uses a munchausen reward addon etc.

Good to hear it runs without warning :)

@ChenyangRan Yes you are right it records the total interactions with the environment.

If you don't use any of the extensions it's plane SAC. If you set a flag like -per 1 it used prioritized experience replay. Setting -munchausen 1 it uses a munchausen reward addon etc.

Good to hear it runs without warning :)

Thanks and I'll close this isusse.