How to transfer glm2 model to fastertransformer

Question

How to transfer glm2 model to fastertransformer

AndreWanga opened this issue a year ago · 4 comments

Hello, I'd like to ask, currently there is a new model called chatglm2-6b on Hugging Face. I'm not sure how to convert this model from PyTorch format to FasterTransformer format.
Here is the website of chatglm2-6b model: https://huggingface.co/THUDM/chatglm2-6b/tree/main

Answer 1 · 2023-07-20T07:37:23.000Z

Convert can refer to ChatGLM-6B‘s fork of FasterTransformer, which requires modification of the convert script. But more importantly is that FasterTransformer doesn't support MQA yet.

Answer 2 · 2023-07-20T08:10:34.000Z

Convert can refer to ChatGLM-6B‘s fork of FasterTransformer, which requires modification of the convert script. But more importantly is that FasterTransformer doesn't support MQA yet.

Could you please tell me where to get the script? Will NVIDIA officially release a faster transformer that supports glm2？By the way I am using Triton to invoke the ft model.

Answer 3 · 2023-07-20T08:19:21.000Z

There is no existing convert script, but you can modify one based on this.

I'm not from Nvidia and I'm not sure when this will be supported, also I'm more concerned about their support for MQA, because only with this support, then it is possible to support glm2.

filed an issue here: #727 (comment)

Answer 4 · 2023-07-20T12:35:01.000Z

There is no existing convert script, but you can modify one based on this.

I'm not from Nvidia and I'm not sure when this will be supported, also I'm more concerned about their support for MQA, because only with this support, then it is possible to support glm2.

filed an issue here: #727 (comment)

Thank you for dealing with my issue any way. Hope the NVIDIA support glm2 soon.