NVIDIA/FasterTransformer

How to transfer glm2 model to fastertransformer

AndreWanga opened this issue · 4 comments

Hello, I'd like to ask, currently there is a new model called chatglm2-6b on Hugging Face. I'm not sure how to convert this model from PyTorch format to FasterTransformer format.
Here is the website of chatglm2-6b model: https://huggingface.co/THUDM/chatglm2-6b/tree/main

ljayx commented

Convert can refer to ChatGLM-6B‘s fork of FasterTransformer, which requires modification of the convert script. But more importantly is that FasterTransformer doesn't support MQA yet.

Convert can refer to ChatGLM-6B‘s fork of FasterTransformer, which requires modification of the convert script. But more importantly is that FasterTransformer doesn't support MQA yet.

Could you please tell me where to get the script? Will NVIDIA officially release a faster transformer that supports glm2?By the way I am using Triton to invoke the ft model.

ljayx commented

There is no existing convert script, but you can modify one based on this.

I'm not from Nvidia and I'm not sure when this will be supported, also I'm more concerned about their support for MQA, because only with this support, then it is possible to support glm2.

filed an issue here: #727 (comment)

There is no existing convert script, but you can modify one based on this.

I'm not from Nvidia and I'm not sure when this will be supported, also I'm more concerned about their support for MQA, because only with this support, then it is possible to support glm2.

filed an issue here: #727 (comment)

Thank you for dealing with my issue any way. Hope the NVIDIA support glm2 soon.