Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch
If you find this project helpful, please give us a star ⭐️, your support is our greatest motivation.
本代码使用朴素的线性层来替换Pytorch中的多头注意力,这使得基于torch.nn.MultiheadAttention实现的Transformer(比如OpenClip)也可以使用Hugingface的PEFT(例如LoRA)进行微调。
The code uses a simple Linear layer to replace the nn.MultiheadAttention in pytorch, making the Transformers (such as OpenClip) based on torch.nn.MultiheadAttention fine-tuning with Hugingface's PEFT (such as LoRA).