LLM Inference Frameworks Benchmark

About

This project aims to find the benchmark performance of many popular LLM inference frameworks. I am currently planning to test vLLM, TensorRT-LLM, FasterTransformer, ONNX Runtime, and DeepSpeed. The intention is to determine the best inference framework given the specified hardware and LLM model. This will be based off multiple evaluation criterias such as performance metrics (ex. throughput, latency, and scalability), whilst also considering other factors such as hardware utlization efficiency, model support, ease of use, hardware/software flexibility, optimization features, deployment complexity and etc.

Test Specs

This project will be solely ran on Kaggle (unless problems arise) and the hardware and software specifications are noted in my Kaggle Specs Tester Notebook.

More to come...