- single GPU: 16GB
-
Download the pretrained_model.
- BitTorrent link:
magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
- The final directory:
. βββ inference.py βββ llama β βββ generation.py β βββ __init__.py β βββ model_parallel.py β βββ model_single.py β βββ tokenizer.py βββ LLaMA β βββ 7B β β βββ checklist.chk β β βββ consolidated.00.pth β β βββ params.json β βββ tokenizer_checklist.chk β βββ tokenizer.model βββ requirements.txt βββ webapp_single.py
- BitTorrent link:
-
Install the related packages.
pip install -r requirements.txt
-
Run
- Inference by scripts
python inference.py
- Run by Gradio UI
python webapp_single.py --ckpt_dir LLaMA/7B \ --tokenizer_path LLaMA/tokenizer.model \ --server_name 127.0.0.1 \ --server_port 7806
- Inference by scripts
-
Gradio Result
- Open
http://127.0.0.1:7860
to enjoy it.
- Open