UserWarning

Question

UserWarning

Closed this issue 4 years ago · 6 comments

Hi, when I run your code, the using warning is:

UserWarning: Using a target size (torch.Size([256, 3])) that is different to the input size (torch.Size([256, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  critic2_loss = 0.5*F.mse_loss(Q_2, Q_targets.detach())

My python is 3.6 and pytorch is 1.4.0 with numpy 1.19.2.
The env is hopper-v2.

Answer 1 · 2020-11-10T15:37:01.000Z

Hey @ChenyangRan sry for the late reply im having some problems with github recently. Can you tell me what script you were running?

Answer 2 · 2020-11-11T07:07:52.000Z

Hey @ChenyangRan sry for the late reply im having some problems with github recently. Can you tell me what script you were running?

Hi, it happened when I test the Hopper-V2 and type the flowing command.

python SAC.py -env Hopper-v2 -ep 200 -info sac

Answer 3 · 2020-11-11T18:44:12.000Z

@ChenyangRan the SAC.py file is kind of old and outdated. Could you try to use the run.py file? If it does not work please let me know :)

Answer 4 · 2020-11-12T01:22:54.000Z

@ChenyangRan the SAC.py file is kind of old and outdated. Could you try to use the run.py file? If it does not work please let me know :)

I'm sorry that I haven't read the README carefully.
I have used the run.py witthe command:

python run.py -env Hopper-v2 -seed 0 -info sac_test -w 2

Is it the original SAC agent? And it works without using warning.
Also, I noticed that the run is with the frames instad of n_epoches, which means the code records the total steps?

Answer 5 · 2020-11-13T11:02:50.000Z

@ChenyangRan Yes you are right it records the total interactions with the environment.

If you don't use any of the extensions it's plane SAC. If you set a flag like -per 1 it used prioritized experience replay. Setting -munchausen 1 it uses a munchausen reward addon etc.

Good to hear it runs without warning :)

Answer 6 · 2020-11-13T11:43:01.000Z

@ChenyangRan Yes you are right it records the total interactions with the environment.

If you don't use any of the extensions it's plane SAC. If you set a flag like -per 1 it used prioritized experience replay. Setting -munchausen 1 it uses a munchausen reward addon etc.

Good to hear it runs without warning :)

Thanks and I'll close this isusse.