two dimensional input action in DDPG
chenyi8920 opened this issue · 3 comments
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, gymnasium as gym, torch, numpy, sys print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform) 0.5.1 0.28.1 2.1.1+cu121 1.24.4 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] win32
Hi i am new to RL and currently using tianshou for my graduation project. Unlike PPO, I noticed DDPG implementation does not demonstrate how the action was chosen(input into), the only clue i found is positional argument action_scaling
in DDPGPolicy
. My own env requires 2 dimensional continuous input, so i am stuck transforming tianshou's default [-1, 1]
action into my unparrallel action input. Should I create my own DDPG or is there any way to resolve this issue in DDPGPolicy
implementation?
Also, the pypi & github page's tianshou version is up to date, but doc & releases still stay at 0.5.0. It did take some effort to follow doc's outdated api. Tianshou is awesome and hope you guys update soon :)
for further information, my action input in actor network using torch would look like this:
class Actor(nn.Module):
def __init__(self, state_dim, action_dim, hidden_dim=64):
super(Actor, self).__init__()
self.fc1 = nn.Linear(state_dim, hidden_dim)
self.fc2 = nn.Linear(hidden_dim, hidden_dim)
self.fc3 = nn.Linear(hidden_dim, action_dim)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = torch.tanh(self.fc3(x))
speed = 7.5 * x[:, 0] + 7.5
angle = 25 * x[:, 1]
return torch.stack([speed, angle], dim=-1)
Hi @chenyi8920 , glad you are using tianshou for your research, being helpful for that is one of our explicit goals! Please use the master version and the documentation from master (not stable), i.e. the one at https://tianshou.org/en/master/. Ignore the warning about the oudated docs, it's just there because we haven't released in a long time. The next release of 1.0.0 should happen next week.
The action (de)normalization happens with the methods map_action
and map_action_inverse
in BasePolicy
, from which DDPGPolicy
inherits. If you want to customize this mechanism beyond what BasePolicy
permits, you can make a class class MyDDPGPolicy(DDPGPolicy)
and override these methods.
Let me know if this answered your question.
Hi Mischa, ty for your reply. I thought the same, and yes I was also tricked by the warning in master's doc. I'll read this new doc and try making my own DDPG
now. The methods you mentioned in BasePolicy
might be helpful, thanks again & good day to you!