NVIDIA/TensorRT-LLM

[Feature]: Improve the performance of FP8 models

Opened this issue 2 months ago · 0 comments

nzmora-nvidia commented 2 months ago

🚀 The feature, motivation and pitch

nvidia/Llama-3.1-8B-Instruct-FP8 is 2x slower than PT

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.