ChrisZonghaoLi/sky130_ldo_rl

Code confusion

Closed this issue · 6 comments

Hello, the MLP_DDPG in the paper and program is not the standard DDPG. I would like to ask if there is any result of the standard DDPG algorithm, that is, the DDPG algorithm in which the state and action of the agent are not processed by MLP?

Hi,

I'm not too sure exactly what you mean by "standard DDPG". The application of MLP is just as a function approximator, and it is quite "standard". This function approximator can be also some other types of NN.

Why not remove the MLP and then the input to the network in DDPG is a patchwork of states and actions that have not been processed by the MLP?

class Critic(torch.nn.Module):
    def __init__(self, CktGraph):
        super().__init__()
        self.num_node_features = CktGraph.num_node_features
        self.action_dim = CktGraph.action_dim
        self.device = CktGraph.device
        self.edge_index = CktGraph.edge_index
        self.num_nodes = CktGraph.num_nodes

        self.in_channels = self.num_node_features + self.action_dim
        self.out_channels = 1

       **### _**# Can the input dimension of mlp1 be self.num_nodes* (self.num_node_features + self.action_dim),
       # And remove the following x = self.lin1(torch.flatten(x)).reshape(1, -1), which is not in standard DDPG.
       # That is, status and actions are not processed before input into the critic network**_**

        self.mlp1 = Linear(self.in_channels, 32)
        self.mlp2 = Linear(32, 32)
        self.mlp3 = Linear(32, 16)
        self.mlp4 = Linear(16, 16)
        self.lin1 = LazyLinear(self.out_channels)

    def forward(self, state, action):
        batch_size = state.shape[0]
        device = self.device

        action = action.repeat_interleave(self.num_nodes, 0).reshape(
            batch_size, self.num_nodes, -1)
        data = torch.cat((state, action), axis=2)

        values = torch.tensor(()).to(device)
        for i in range(batch_size):
            x = data[i]
            x = F.relu(self.mlp1(x))
            x = F.relu(self.mlp2(x))
            x = F.relu(self.mlp3(x))
            x = F.relu(self.mlp4(x))
            x = self.lin1(torch.flatten(x)).reshape(1, -1)
            values = torch.cat((values, x), axis=0)

        return values

I'm not sure if my recollections are accurate (or if I fully understand your question), but I remember this might have something to do with the Gym environment requirement. The state from the environment is a matrix, not a vector as most environments have, so I have to do some reshaping to it to get the dimension correct (something like this) for MLP.

Otherwise, you might try to see if yours will work or not. I don't think the end result will be of a huge difference anyway.

Thanks to you, I have tried it on other simulation objects and there is no difference between DDPG and MLPDDPG.

Thanks for the confirmation.