After joining Lora, the first few layers show a gradient of 0
Opened this issue · 0 comments
I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if my data is too small (batch_size=32)。
I tried to add Lora to a three-layer neural network, but the result was that only the gradients of the Lora_a and Lora_b matrices in the last layer were below 1e-2, while the gradients of the other layers were all 0.
My definition of lora. linear is as follows:
self.prednet_full1_lora = lora.Linear(self.prednet_input_len,self.prednet_len1,r=4)
self.prednet_full2_lora = lora.Linear(self.prednet_len1, self.prednet_len2, r=4)
self.prednet_full3_lora = lora.Linear(self.prednet_len2, 1,r=4)
The forward part of the model is shown below (assuming input_x is the input):
input_x = torch.sigmoid(self.prednet_full1_lora.forward(input_x))
input_x = torch.sigmoid(self.prednet_full2_lora.forward(input_x))
output = torch.sigmoid(self.prednet_full3_lora.forward(input_x))
and I don't forget to write :
loss.backward()
optimizer.step()
net.apply_clipper()
I would greatly appreciate it if you could provide some ideas or solutions