How to use FP8 feature in TGI-gaudi
lvliang-intel opened this issue · 1 comments
lvliang-intel commented
System Info
The FP8 quantization feature has been incorporated into the TGI-Gaudi branch. However, guidance is needed on how to utilize this feature. The process involves running the FP8 quantization through Measurement Mode and Quantization Mode. How to enable FP8 using the TGI 'docker run' command? Could you kindly provide a step-by-step guide on utilizing this feature?"
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Run the FP8 quantization feature using "docker run" command.
Expected behavior
A clear guide can be provided to use the FP8 quantization feature.
kdamaszk commented
@lvliang-intel thanks! Official support for FP8 will be added soon. We will add more info about usage into the README then