CudnnRNN is not differentiable twice
erschmidt opened this issue · 4 comments
Hi,
First of all thank you very much for providing this great repository!
I am currently implementing GAIL using a MLP as well as an RNN policy net (Two different experiments). The MLP network is working as intended but if I switch to the RNN policy I get a RuntimeError: CudnnRNN is not differentiable twice during execution of this line in core.trpo.Fvp_fim
:
Jv = torch.autograd.grad(Jtv, t, retain_graph=True)[0]
The only difference between my MLP and RNN policy implementation is the initialization of the hidden state during get_log_prob
and get_fim
within my policy class.
Given your recent commit (d66765eecad38ddc3f6e0f33d35ef70a7ed11892) I thought that the network is only differentiating once during TRPO.
Am I doing something wrong or is the network still differentiating twice ?
Thank you very much!
Hi,
I solved this problem by switching to CUDA 9.0 and reinstall PyTorch. Another thing is to use LSTMCell than LSTM.
Best,
Ye
Thanks for the fast feedback.
Sadly still no luck. I switched my GRU Layer to a GRUCell which only changed the Error to RuntimeError: GRUFused is not differentiable twice.
Since I'm working in an environment where can't easily change the CUDA Version (Currently 7.5) using a different CUDA is no option. Are you sure that this would solve the problem?
The relevant functions of my policy look like this:
def forward(self, inputs):
x = self.hidden_activation(self.input_layer(inputs))
# Hidden Layers
for hidden_layer in self.hidden_layers:
x = self.hidden_activation(hidden_layer(x))
# GRUCell
outputs = []
for seq in range(x.size(1)):
self.hidden = self.gru(x[:, seq], self.hidden)
outputs.append(self.hidden)
x = torch.stack(outputs, 1)
# Output Layer
action_mean = self.output_layer(x)
action_log_std = self.a_logstd.expand_as(action_mean)
action_std = torch.exp(action_log_std)
return action_mean, action_log_std, action_std
def get_log_prob(self, x, actions):
self.hidden = self.init_hidden(x.size(0))
action_mean, action_log_std, action_std = self.forward(x)
return normal_log_density(actions, action_mean, action_log_std, action_std, is_recurrent=True)
def get_fim(self, x):
self.hidden = self.init_hidden(x.size(0))
mean, _, _ = self.forward(x)
cov_inv = self.a_logstd.data.exp().pow(-2).squeeze(0).repeat(x.size(0))
param_count = 0
std_index = 0
id = 0
for name, param in self.named_parameters():
if name == "a_logstd":
std_id = id
std_index = param_count
param_count += param.data.view(-1).shape[0]
id += 1
return cov_inv, mean, {'std_id': std_id, 'std_index': std_index}
I'm really hoping to solve this issue since I need to implement this using a RNN policy.
I’m pretty positive that changing the CUDA version will solve the problem if you are using GRUCell since that was my case and I didn’t change a single line of code. Alternatively, you can use PPO instead of TRPO, which should give you similar performance.
I did change from TRPO to PPO. I will compare results but it seems to train fine. Thank you !