microsoft/LoRA

Pre-trained conv weight is not same as that self.conv.weight

aleemsidra opened this issue · 3 comments

Hi! I am trying to fine tune Conv2d with LoRA. I first loaded the pre-trained model weights. Below is the snippet of original Conv2d in the model and respective weight of first matrix:

 model
Out[1]: 
UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  )
)

In [2]: model.init_path[0].weight[0][0]
Out[2]: 
tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

Now when I replaced Conv2d with ConvLoRA (tried both manualy replacing and dynamically replacing the layer), the model architecture got updated as follows, but strangely, the weights of model.init_path[0].conv.weight[0][0] is not same as of original pre-trained weights:

 model
Out[1]: 
UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    )
  )
)

In [2]: model.init_path[0].conv.weight[0][0]
Out[2]: 
tensor([[-0.0576,  0.0696,  0.1721],
        [ 0.2691,  0.3037, -0.2643],
        [ 0.0839, -0.1434, -0.0365]])

Why the conv weights is different than what was in original model?

Thanks for sharing this. It might have something to do with how you load the checkpoint. Can you provide a minimal example where this happens?

@edwardjhu , I am first loading the model as:

model.load_state_dict(torch.load( "/home/sidra/Documents/Domain_Apatation/UDAS/src/checkpoints/base_model_mms_2023-07-06_12-45-28_PM/dc_model.pth"), strict=False).

Below is the structure of a part of loaded model:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
      )
    )
  )
)

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

# Replacing Conv layers with LoRa layer 

for name, sub_module in model.named_children():
    for name, layer in list(sub_module.named_children()): 
        #Conv2d
        if isinstance(layer, nn.Conv2d):
            setattr(sub_module, name, lora.Conv2d(
            layer.in_channels,
            layer.out_channels,
            kernel_size=layer.kernel_size[0],
            r=2,
            lora_alpha=2))


        # ResBlock
        elif isinstance(sub_module, nn.Sequential):
            for name, layer in list(sub_module.named_children()):
                if isinstance(layer, ResBlock):
                        for i, preactivation_module in enumerate(layer.conv_path):
                            if isinstance(preactivation_module, PreActivationND) and isinstance(preactivation_module.layer, nn.Conv2d):
                                setattr(preactivation_module, 'layer', lora.Conv2d(
                                    preactivation_module.layer.in_channels,
                                    preactivation_module.layer.out_channels,
                                    kernel_size=preactivation_module.layer.kernel_size[0],
                                    r=2,
                                    lora_alpha=2))

The updated model structure looks like this:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
    )
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
      )
    )
  )
)

Then I checked the weather lora matrices have been injected correctly by checking param names as:

for name, param in model.named_parameters():
      print(name)
init_path.0.lora_A
init_path.0.lora_B
init_path.0.conv.weight
init_path.0.conv.bias
init_path.2.conv_path.0.bn.weight
init_path.2.conv_path.0.bn.bias
init_path.2.conv_path.0.layer.lora_A
init_path.2.conv_path.0.layer.lora_B
init_path.2.conv_path.0.layer.conv.weight
init_path.2.conv_path.0.layer.conv.bias
init_path.2.conv_path.1.bn.weight
init_path.2.conv_path.1.bn.bias
init_path.2.conv_path.1.layer.lora_A
init_path.2.conv_path.1.layer.lora_B
init_path.2.conv_path.1.layer.conv.weight
init_path.2.conv_path.1.layer.conv.bias

Which shows that lora layers have been correctly added. But when I check the weights of conv layers in pre-trained model and one after injecting LoRa layers its not same:

# Pre-trained model
 model.init_path[0].weight[0][0]

tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

# with LoRa
model.init_path[0].conv.weight[0][0]
tensor([[ 0.1168,  0.0223, -0.1227],
        [-0.2735, -0.2281, -0.2859],
        [ 0.2369, -0.1391, -0.0499]])

Moreoevr, in my original Conv2D, bias is set to False, but when I checked model.init_path[0].conv.bias it gives:

Parameter containing:
tensor([-0.1540,  0.0532, -0.0386, -0.0889, -0.1558,  0.0867, -0.2746,  0.3279,
        -0.0516,  0.0622,  0.1098, -0.1297,  0.2631, -0.0025,  0.0273, -0.3173],
       requires_grad=True)

The requires_grad is also True, but in pre-trained conv layer, bias was False, so from where these values are coming?

Can you please give your feedback on this?

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

This seems to be the problem. If you manually replace the layers after loading the ckpt, these layers will be rewritten. These layers also have biases because you didn't pass bias=False to the constructor.

Can you try modifying the architecture and then loading the ckpt?