tensorchord/modelz-llm

feat: Support more models

Closed this issue · 3 comments

  • LLaMA
  • Bloomz
  • ChatGLM 6B (non-int4)
  • Vicuna
  • GPT-NeoX
  • StarCoder
  • MOSS

There is no difference between the quantized model and the original model, at least for such a service.

Thus we need to update the env vars to THUDM/chatglm-6b, thus it should work, right?

Thus we need to update the env vars to THUDM/chatglm-6b, thus it should work, right?

It should work. I use the int4 because it's tiny to try.