[Feature]: Improve the performance of FP8 models
Opened this issue ยท 0 comments
nzmora-nvidia commented
๐ The feature, motivation and pitch
nvidia/Llama-3.1-8B-Instruct-FP8 is 2x slower than PT
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.