feat: Support more models
Closed this issue · 3 comments
gaocegege commented
- LLaMA
- Bloomz
- ChatGLM 6B (non-int4)
- Vicuna
- GPT-NeoX
- StarCoder
- MOSS
kemingy commented
There is no difference between the quantized model and the original model, at least for such a service.
gaocegege commented
Thus we need to update the env vars to THUDM/chatglm-6b, thus it should work, right?
kemingy commented
Thus we need to update the env vars to THUDM/chatglm-6b, thus it should work, right?
It should work. I use the int4 because it's tiny to try.