Readme Should Have Inference Command to use for Quantization in Text

Question

Readme Should Have Inference Command to use for Quantization in Text

chigkim opened this issue 2 years ago · 1 comments

Could you put the actual text for command to run inference with Quantization?
I cannot see the image because I'm blind and uses screen reader.
Readme says "With quantization, you can run LLaMA with a 4GB memory GPU." Then it has two pictures.
Thanks!

Answer 1 · 2023-04-08T23:05:57.000Z

python3 quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "The meaning of life is" --max_length 24 --cuda cuda:0