How can I use the Llama-2-7b-longlora-100k-ft model correctly
seanxuu opened this issue · 0 comments
seanxuu commented
I used some A100 gpus to get this model to load successfully, but it doesn't output properly(Its output is blank Or random characters).
my command:
export CUDA_VISIBLE_DEVICES =3,4,5 python3 inference.py \
--base_model /models/Llama-2-7b-longlora-100k-ft \
--question "Why doesn't Professor Snape seem to like Harry?" \
--context_size 100000 \
--max_gen_len 512 \
--flash_attn True \
--material "materials/Harry Potter and The Order of the Phoenix.txt"
Here is a record of my experiment
10000token input + PROMPT_DICT["prompt_llama2"]:
10000token input + PROMPT_DICT["prompt_no_input"]: