test model inference benchmark(ChatGLM2-6B,LLaMA2-7b-chat,Baichuan2-7B-chat)
input token | throughput* | first token耗时(ms) | one token耗时(ms)| |
---|---|---|---|
32 | |||
64 | |||
128 | |||
256 | |||
512 | |||
1024 | |||
2048 |
test model inference benchmark(ChatGLM2-6B,LLaMA2-7b-chat,Baichuan2-7B-chat)
Python