error with Compute logprob from Gaussian, and then apply correction for Tanh squashing.

Question

error with Compute logprob from Gaussian, and then apply correction for Tanh squashing.

coco875 opened this issue 2 years ago · 6 comments

It's my first IA so I don't know what it is but make the error IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) so when I try to fix he make more error so how I can fix this

the code originaly:

# Compute logprob from Gaussian, and then apply correction for Tanh squashing.
# NOTE: The correction formula is a little bit magic. To get an understanding
# of where it comes from, check out the original SAC paper (arXiv 1801.01290)
# and look in appendix C. This is a more numerically-stable equivalent to Eq 21.
# Try deriving it yourself as a (very difficult) exercise. :)
logp_pi = pi_distribution.log_prob(pi_action).sum(axis=-1)
logp_pi -= (2 * (np.log(2) - pi_action - F.softplus(-2 * pi_action))).sum(axis=1)```

pi_action is the same format as in the default code

Answer 1 · 2022-08-16T14:40:10.000Z

you can find the full code here

Answer 2 · 2022-08-16T18:44:29.000Z

Probably a unsqueeze or reshape is missing somewhere, or something is wrong with the batch dimension.

If this is your first AI I strongly advise following the tutorial rather than reorganizing the whole library in your own files, you will be much less likely to break random things. Then you can simply copy-paste or import the relevant classes from the library where you need them.

Answer 3 · 2022-08-16T20:01:00.000Z

the class is this

class SquashedGaussianMLPActor(ActorModule):
    def __init__(self, observation_space, action_space, hidden_sizes=(1024, 1024), activation=nn.ReLU, act_buf_len=0):
        super().__init__(observation_space, action_space)
        dim_obs = sum(prod(s for s in space.shape) for space in observation_space)
        print(dim_obs)
        # dim_obs = 499
        dim_act = action_space.shape[0]
        act_limit = action_space.high[0]
        self.net = mlp([dim_obs] + list(hidden_sizes), activation, activation)
        self.mu_layer = nn.Linear(hidden_sizes[-1], dim_act)
        self.log_std_layer = nn.Linear(hidden_sizes[-1], dim_act)
        self.act_limit = act_limit

    def forward(self, obs, test=False, with_logprob=True):
        data = torch.cat(obs, -1)
        np.ma.masked_array(data, ~np.isfinite(data)).filled(0)
        net_out = self.net(data)

        mu = self.mu_layer(net_out)
        log_std = self.log_std_layer(net_out)
        log_std = torch.clamp(log_std, LOG_STD_MIN, LOG_STD_MAX)
        std = torch.exp(log_std)

        # Pre-squash distribution and sample
        pi_distribution = Normal(mu, std)
        if test:
            # Only used for evaluating policy at test time.
            pi_action = mu
        else:
            pi_action = pi_distribution.rsample()

        if with_logprob:
            # Compute logprob from Gaussian, and then apply correction for Tanh squashing.
            # NOTE: The correction formula is a little bit magic. To get an understanding
            # of where it comes from, check out the original SAC paper (arXiv 1801.01290)
            # and look in appendix C. This is a more numerically-stable equivalent to Eq 21.
            # Try deriving it yourself as a (very difficult) exercise. :)
            print(pi_action)
            logp_pi = pi_distribution.log_prob(pi_action).sum(axis=-1)
            logp_pi -= (2 * (np.log(2) - pi_action - F.softplus(-2 * pi_action))).sum(axis=1)
        else:
            logp_pi = None

        pi_action = torch.tanh(pi_action)
        pi_action = self.act_limit * pi_action

        pi_action = pi_action.squeeze()

        return pi_action, logp_pi

    def act(self, obs, test=False):
        with torch.no_grad():
            a, _ = self.forward(obs, test, False)
            return a.numpy()

and I see the tuto and is very close to be the same (the print show : tensor([-2.2753e+27, 1.1778e+27, -2.6203e+26], grad_fn=<AddBackward0>) )

Answer 4 · 2022-08-16T20:01:56.000Z

and I use dict so I reorganise because dict is not supported by default
for exemple

Answer 5 · 2022-08-16T22:05:47.000Z

Which line does the error come from?

(I am closing this issue as this is not an issue with tmrl but feel free to continue the discussion here. Please use social media or stackoverflow for this kind of help requests in the future, GitHub issues are for tracking bugs in the library. Probably we can open a discord server for tmrl at some point @edigeze ?)

Answer 6 · 2022-08-17T00:04:10.000Z

ok if you want we can continue on dicord (coco#8012) and yes is not a bug of tmrl is just me who don't know how it work x)