How to set Lora_dropout=0 when loading trained peft model for inference?

Question

How to set Lora_dropout=0 when loading trained peft model for inference?

flyliu2017 opened this issue 2 months ago · 2 comments

flyliu2017 commented 2 months ago

System Info

peft==0.10.0
transformers==4.39.3

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

class Linear(nn.Module, LoraLayer): 

   def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
        self._check_forward_args(x, *args, **kwargs)
        adapter_names = kwargs.pop("adapter_names", None)

        if self.disable_adapters:
            if self.merged:
                self.unmerge()
            result = self.base_layer(x, *args, **kwargs)
        elif adapter_names is not None:
            result = self._mixed_batch_forward(x, *args, adapter_names=adapter_names, **kwargs)
        elif self.merged:
            result = self.base_layer(x, *args, **kwargs)
        else:
            result = self.base_layer(x, *args, **kwargs)
            torch_result_dtype = result.dtype
            for active_adapter in self.active_adapters:
                if active_adapter not in self.lora_A.keys():
                    continue
                lora_A = self.lora_A[active_adapter]
                lora_B = self.lora_B[active_adapter]
                dropout = self.lora_dropout[active_adapter]
                scaling = self.scaling[active_adapter]
                x = x.to(lora_A.weight.dtype)

                if not self.use_dora[active_adapter]:
                    result = result + lora_B(lora_A(dropout(x))) * scaling
                else:
                    x = dropout(x)
                    result = result + self._apply_dora(x, lora_A, lora_B, scaling, active_adapter)

            result = result.to(torch_result_dtype)

        return result

Expected behavior

We can see that lora_dropout in forward function is working the same way whether under train or inference mode.

Answer 1 · 2024-04-25T08:57:50.000Z

We can see that lora_dropout in forward function is working the same way whether under train or inference mode.

Did you try it out? The nn.Dropout layer is not applying dropout unless it is in training mode. Moreover, when we set dropout to 0 at initialization, self.dropout is set to nn.Identity. Please check if dropout is really applied in your case or if it's a misunderstanding of the code.

Answer 2 · 2024-05-10T02:21:27.000Z

We can see that lora_dropout in forward function is working the same way whether under train or inference mode.

Did you try it out? The nn.Dropout layer is not applying dropout unless it is in training mode. Moreover, when we set dropout to 0 at initialization, self.dropout is set to nn.Identity. Please check if dropout is really applied in your case or if it's a misunderstanding of the code.

Thank you! The key point is training mode of the model! I train the model without evaluation, so the model after training is still in 'training' mode, which led to inconsistent performance between the model and the one loaded from a checkpoint.