How can I use the Llama-2-7b-longlora-100k-ft model correctly

Question

How can I use the Llama-2-7b-longlora-100k-ft model correctly

seanxuu opened this issue a year ago · 0 comments

I used some A100 gpus to get this model to load successfully, but it doesn't output properly(Its output is blank Or random characters).
my command:

export CUDA_VISIBLE_DEVICES =3,4,5  python3 inference.py  \
        --base_model /models/Llama-2-7b-longlora-100k-ft \
        --question "Why doesn't Professor Snape seem to like Harry?" \
        --context_size 100000 \
        --max_gen_len 512 \
        --flash_attn True \
        --material "materials/Harry Potter and The Order of the Phoenix.txt"

Here is a record of my experiment
10000token input + PROMPT_DICT["prompt_llama2"]:

10000token input + PROMPT_DICT["prompt_no_input"]:

10000token input+ PROMPT_DICT["prompt_no_input_llama2"]