/LLM-benchmark

test model inference benchmark(ChatGLM2-6B,LLaMA2-7b-chat,Baichuan2-7B-chat)

Primary LanguagePython

LLM-benchmark

test model inference benchmark(ChatGLM2-6B,LLaMA2-7b-chat,Baichuan2-7B-chat)

input token throughput* first token耗时(ms) one token耗时(ms)|
32
64
128
256
512
1024
2048