abertsch72/unlimiformer

Script utilizing LLM

jcgeo9 opened this issue · 1 comments

jcgeo9 commented

Can you provide a script similar to inference-example.py, that utilises run_generation.py file? i.e instead of command like execution
python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-13b-chat-hf \ --prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " \ --prompt example_inputs/harry_potter_full.txt \ --suffix " [/INST]" --test_unlimiformer --fp16 --length 200 --layer_begin 16 \ --index_devices 1 --datastore_device 1
instead load the model and run inference from python script.
Thanks in advance!

You can do this from a script by importing run_generation and calling it with your arguments:

from run_generation import main
main(['--model_type', 'llama', '--model_name_or_path', 'meta-llama/Llama-2-13b-chat-hf', <rest of your args here>])```