juncongmoo/pyllama

Readme Should Have Inference Command to use for Quantization in Text

chigkim opened this issue · 1 comments

Could you put the actual text for command to run inference with Quantization?
I cannot see the image because I'm blind and uses screen reader.
Readme says "With quantization, you can run LLaMA with a 4GB memory GPU." Then it has two pictures.
Thanks!

python3 quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "The meaning of life is" --max_length 24 --cuda cuda:0