Khrylx/PyTorch-RL

CudnnRNN is not differentiable twice

erschmidt opened this issue · 4 comments

Hi,

First of all thank you very much for providing this great repository!

I am currently implementing GAIL using a MLP as well as an RNN policy net (Two different experiments). The MLP network is working as intended but if I switch to the RNN policy I get a RuntimeError: CudnnRNN is not differentiable twice during execution of this line in core.trpo.Fvp_fim:

Jv = torch.autograd.grad(Jtv, t, retain_graph=True)[0]

The only difference between my MLP and RNN policy implementation is the initialization of the hidden state during get_log_prob and get_fim within my policy class.

Given your recent commit (d66765eecad38ddc3f6e0f33d35ef70a7ed11892) I thought that the network is only differentiating once during TRPO.

Am I doing something wrong or is the network still differentiating twice ?

Thank you very much!

Hi,

I solved this problem by switching to CUDA 9.0 and reinstall PyTorch. Another thing is to use LSTMCell than LSTM.

Best,
Ye

Thanks for the fast feedback.

Sadly still no luck. I switched my GRU Layer to a GRUCell which only changed the Error to RuntimeError: GRUFused is not differentiable twice.

Since I'm working in an environment where can't easily change the CUDA Version (Currently 7.5) using a different CUDA is no option. Are you sure that this would solve the problem?
The relevant functions of my policy look like this:

def forward(self, inputs):
        x = self.hidden_activation(self.input_layer(inputs))

        # Hidden Layers
        for hidden_layer in self.hidden_layers:
            x = self.hidden_activation(hidden_layer(x))
            
        # GRUCell
        outputs = []
        for seq in range(x.size(1)):
                self.hidden = self.gru(x[:, seq], self.hidden)
                outputs.append(self.hidden)
        x = torch.stack(outputs, 1)
        
        # Output Layer
        action_mean = self.output_layer(x)
        action_log_std = self.a_logstd.expand_as(action_mean)
        action_std = torch.exp(action_log_std)

        return action_mean, action_log_std, action_std

def get_log_prob(self, x, actions):
        self.hidden = self.init_hidden(x.size(0))
        action_mean, action_log_std, action_std = self.forward(x)
        return normal_log_density(actions, action_mean, action_log_std, action_std, is_recurrent=True)

def get_fim(self, x):
        self.hidden = self.init_hidden(x.size(0))
        mean, _, _ = self.forward(x)
        cov_inv = self.a_logstd.data.exp().pow(-2).squeeze(0).repeat(x.size(0))
        param_count = 0
        std_index = 0
        id = 0
        for name, param in self.named_parameters():
            if name == "a_logstd":
                std_id = id
                std_index = param_count
            param_count += param.data.view(-1).shape[0]
            id += 1
        return cov_inv, mean, {'std_id': std_id, 'std_index': std_index}

I'm really hoping to solve this issue since I need to implement this using a RNN policy.

I’m pretty positive that changing the CUDA version will solve the problem if you are using GRUCell since that was my case and I didn’t change a single line of code. Alternatively, you can use PPO instead of TRPO, which should give you similar performance.

I did change from TRPO to PPO. I will compare results but it seems to train fine. Thank you !