Can peft support ColumnParallelLinear?
wjn1996 opened this issue · 2 comments
System Info
I have a model, and the architecture has xxxParallel attributes, which are used for parallel inference:
BaichuanForCausalLM(
(model): BaiChuanModel(
(embed_tokens): VocabParallelEmbedding()
(layers): ModuleList(
(0-31): 32 x BaiChuanDecoderLayer(
(self_attn): BaiChuanAttention(
(W_pack): ColumnParallelLinear()
(o_proj): RowParallelLinear()
(attn): PagedAttentionWithALiBi()
)
(mlp): BaiChuanMLP(
(gate_up_proj): ColumnParallelLinear()
(down_proj): RowParallelLinear()
(act_fn): SiluAndMul()
)
(input_layernorm): RMSNorm()
(post_attention_layernorm): RMSNorm()
)
)
(norm): RMSNorm()
)
(lm_head): ColumnParallelLinear()
(sampler): Sampler()
)
I want to directly load this model with peft (lora), but it throws an error:
ValueError: Target module ColumnParallelLinear() is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.
So, how can I implement this process without any model architecture update?
Who can help?
@pacman100 @younesbelkada @BenjaminBossan @sayakpaul
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
# LLM 和 SamplingParams
# pip install vllm==0.2.1 (cuda=11.8)
from vllm import LLM, SamplingParams
from peft import PeftModel
# Function to load the PeftModel for performance optimization
def load_peft_model(model, peft_model):
peft_model = PeftModel.from_pretrained(model, peft_model)
return peft_model
prompts = [
"xxx",
]
sampling_params = SamplingParams(temperature=1.0, top_p=0.9)
model_name = "baichuan2-7b-base"
origin_model_path = "xxx/pre-trained-lm/{}".format(model_name)
saved_model_path = "xxx/v2/{}/checkpoint-8000".format(model_name) # lora path
save_answer_path = "xxx/{}".format(model_name)
llm = LLM(model=origin_model_path, trust_remote_code=True)
model = llm.llm_engine.workers[0].model
model = load_peft_model(model, saved_model_path)
llm.llm_engine.workers[0].model = model
outputs = llm.generate(
prompts,
sampling_params,
# lora_request=LoRARequest("headline-lora", 1, saved_model_path)
)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
Expected behavior
solve this issue.
So I assume you're using megatron. Did you try this:
https://huggingface.co/docs/peft/v0.10.0/en/package_reference/lora#peft.LoraConfig.megatron_config
Here is an example: https://github.com/huggingface/peft/blob/main/tests/test_lora_megatron.py
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.