huggingface/peft

Can peft support ColumnParallelLinear?

wjn1996 opened this issue · 2 comments

System Info

I have a model, and the architecture has xxxParallel attributes, which are used for parallel inference:

BaichuanForCausalLM(
  (model): BaiChuanModel(
    (embed_tokens): VocabParallelEmbedding()
    (layers): ModuleList(
      (0-31): 32 x BaiChuanDecoderLayer(
        (self_attn): BaiChuanAttention(
          (W_pack): ColumnParallelLinear()
          (o_proj): RowParallelLinear()
          (attn): PagedAttentionWithALiBi()
        )
        (mlp): BaiChuanMLP(
          (gate_up_proj): ColumnParallelLinear()
          (down_proj): RowParallelLinear()
          (act_fn): SiluAndMul()
        )
        (input_layernorm): RMSNorm()
        (post_attention_layernorm): RMSNorm()
      )
    )
    (norm): RMSNorm()
  )
  (lm_head): ColumnParallelLinear()
  (sampler): Sampler()
)

I want to directly load this model with peft (lora), but it throws an error:

ValueError: Target module ColumnParallelLinear() is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

So, how can I implement this process without any model architecture update?

Who can help?

@pacman100 @younesbelkada @BenjaminBossan @sayakpaul

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

# LLM 和 SamplingParams
# pip install vllm==0.2.1 (cuda=11.8)
from vllm import LLM, SamplingParams
from peft import PeftModel
# Function to load the PeftModel for performance optimization
def load_peft_model(model, peft_model):
    peft_model = PeftModel.from_pretrained(model, peft_model)
    return peft_model

prompts = [
    "xxx",
]

sampling_params = SamplingParams(temperature=1.0, top_p=0.9)

model_name = "baichuan2-7b-base"
origin_model_path = "xxx/pre-trained-lm/{}".format(model_name)
saved_model_path = "xxx/v2/{}/checkpoint-8000".format(model_name) # lora path
save_answer_path = "xxx/{}".format(model_name)

llm = LLM(model=origin_model_path, trust_remote_code=True)

model = llm.llm_engine.workers[0].model
model = load_peft_model(model, saved_model_path)
llm.llm_engine.workers[0].model = model


outputs = llm.generate(
    prompts, 
    sampling_params,
    # lora_request=LoRARequest("headline-lora", 1, saved_model_path)
    )


for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Expected behavior

solve this issue.

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.