thu-ml/tianshou

two dimensional input action in DDPG

chenyi8920 opened this issue · 3 comments

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
    0.5.1 0.28.1 2.1.1+cu121 1.24.4 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] win32

Hi i am new to RL and currently using tianshou for my graduation project. Unlike PPO, I noticed DDPG implementation does not demonstrate how the action was chosen(input into), the only clue i found is positional argument action_scaling in DDPGPolicy. My own env requires 2 dimensional continuous input, so i am stuck transforming tianshou's default [-1, 1] action into my unparrallel action input. Should I create my own DDPG or is there any way to resolve this issue in DDPGPolicy implementation?

Also, the pypi & github page's tianshou version is up to date, but doc & releases still stay at 0.5.0. It did take some effort to follow doc's outdated api. Tianshou is awesome and hope you guys update soon :)

for further information, my action input in actor network using torch would look like this:

class Actor(nn.Module):
    def __init__(self, state_dim, action_dim, hidden_dim=64):
        super(Actor, self).__init__()
        self.fc1 = nn.Linear(state_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, action_dim)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.tanh(self.fc3(x))
        speed = 7.5 * x[:, 0] + 7.5
        angle = 25 * x[:, 1]
        return torch.stack([speed, angle], dim=-1)

Hi @chenyi8920 , glad you are using tianshou for your research, being helpful for that is one of our explicit goals! Please use the master version and the documentation from master (not stable), i.e. the one at https://tianshou.org/en/master/. Ignore the warning about the oudated docs, it's just there because we haven't released in a long time. The next release of 1.0.0 should happen next week.

The action (de)normalization happens with the methods map_action and map_action_inverse in BasePolicy, from which DDPGPolicy inherits. If you want to customize this mechanism beyond what BasePolicy permits, you can make a class class MyDDPGPolicy(DDPGPolicy) and override these methods.

Let me know if this answered your question.

Hi Mischa, ty for your reply. I thought the same, and yes I was also tricked by the warning in master's doc. I'll read this new doc and try making my own DDPG now. The methods you mentioned in BasePolicy might be helpful, thanks again & good day to you!