trackmania-rl/tmrl

error with Compute logprob from Gaussian, and then apply correction for Tanh squashing.

coco875 opened this issue · 6 comments

It's my first IA so I don't know what it is but make the error IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) so when I try to fix he make more error so how I can fix this

the code originaly:

# Compute logprob from Gaussian, and then apply correction for Tanh squashing.
# NOTE: The correction formula is a little bit magic. To get an understanding
# of where it comes from, check out the original SAC paper (arXiv 1801.01290)
# and look in appendix C. This is a more numerically-stable equivalent to Eq 21.
# Try deriving it yourself as a (very difficult) exercise. :)
logp_pi = pi_distribution.log_prob(pi_action).sum(axis=-1)
logp_pi -= (2 * (np.log(2) - pi_action - F.softplus(-2 * pi_action))).sum(axis=1)```

pi_action is the same format as in the default code

you can find the full code here

Probably a unsqueeze or reshape is missing somewhere, or something is wrong with the batch dimension.

If this is your first AI I strongly advise following the tutorial rather than reorganizing the whole library in your own files, you will be much less likely to break random things. Then you can simply copy-paste or import the relevant classes from the library where you need them.

the class is this

class SquashedGaussianMLPActor(ActorModule):
    def __init__(self, observation_space, action_space, hidden_sizes=(1024, 1024), activation=nn.ReLU, act_buf_len=0):
        super().__init__(observation_space, action_space)
        dim_obs = sum(prod(s for s in space.shape) for space in observation_space)
        print(dim_obs)
        # dim_obs = 499
        dim_act = action_space.shape[0]
        act_limit = action_space.high[0]
        self.net = mlp([dim_obs] + list(hidden_sizes), activation, activation)
        self.mu_layer = nn.Linear(hidden_sizes[-1], dim_act)
        self.log_std_layer = nn.Linear(hidden_sizes[-1], dim_act)
        self.act_limit = act_limit

    def forward(self, obs, test=False, with_logprob=True):
        data = torch.cat(obs, -1)
        np.ma.masked_array(data, ~np.isfinite(data)).filled(0)
        net_out = self.net(data)

        mu = self.mu_layer(net_out)
        log_std = self.log_std_layer(net_out)
        log_std = torch.clamp(log_std, LOG_STD_MIN, LOG_STD_MAX)
        std = torch.exp(log_std)

        # Pre-squash distribution and sample
        pi_distribution = Normal(mu, std)
        if test:
            # Only used for evaluating policy at test time.
            pi_action = mu
        else:
            pi_action = pi_distribution.rsample()

        if with_logprob:
            # Compute logprob from Gaussian, and then apply correction for Tanh squashing.
            # NOTE: The correction formula is a little bit magic. To get an understanding
            # of where it comes from, check out the original SAC paper (arXiv 1801.01290)
            # and look in appendix C. This is a more numerically-stable equivalent to Eq 21.
            # Try deriving it yourself as a (very difficult) exercise. :)
            print(pi_action)
            logp_pi = pi_distribution.log_prob(pi_action).sum(axis=-1)
            logp_pi -= (2 * (np.log(2) - pi_action - F.softplus(-2 * pi_action))).sum(axis=1)
        else:
            logp_pi = None

        pi_action = torch.tanh(pi_action)
        pi_action = self.act_limit * pi_action

        pi_action = pi_action.squeeze()

        return pi_action, logp_pi

    def act(self, obs, test=False):
        with torch.no_grad():
            a, _ = self.forward(obs, test, False)
            return a.numpy()

and I see the tuto and is very close to be the same (the print show : tensor([-2.2753e+27, 1.1778e+27, -2.6203e+26], grad_fn=<AddBackward0>) )

and I use dict so I reorganise because dict is not supported by default
for exemple
image
image
image

Which line does the error come from?

(I am closing this issue as this is not an issue with tmrl but feel free to continue the discussion here. Please use social media or stackoverflow for this kind of help requests in the future, GitHub issues are for tracking bugs in the library. Probably we can open a discord server for tmrl at some point @edigeze ?)

ok if you want we can continue on dicord (coco#8012) and yes is not a bug of tmrl is just me who don't know how it work x)