Inference code and sample data for LLMA paper.
pip install torch tensorflow transformers sentencepiece tqdm
Additionally, you need to get LLaMA model weights and convert to Huggingface format.
One Nvidia V100 32GB GPU or better is recommended.
For retrieval-augmented experiments in the paper, run the following:
# baseline decoding
python decode.py --model_path /path/to/llama_model --input_data_fn ./data/rag.jsonl --type base --forced_decoding --append_docs
# llma decoding
python decode.py --model_path /path/to/llama_model --input_data_fn ./data/rag.jsonl --n 1 --k 20 --type llma --forced_decoding --append_docs
python 7.py --model_path /path/to/llama_model --input_data_fn ./data/rag.jsonl --type base --forced_decoding --append_docs
Here we run "forced_decoding" which forces the output to be the same as the pre-generated output from davinci-003. The reason, as mentioned in the paper (section 3.2), is that the existing LLaMA models cannot generate high-quality output for RAG.
For experiments without forced decoding, we suggest to run summarization on CNNDM dataset using Alpaca 7B model:
# baseline decoding
python decode.py --model_path /path/to/alpaca_model --input_data_fn ./data/cnndm.jsonl --type base
# llma decoding
python decode.py --model_path /path/to/alpaca_model --input_data_fn ./data/cnndm.jsonl --n 1 --k 20 --type llma
test_dataloader = load_dataset(tokenizer=None, test_batch_size=args.test_batch_size, file_dir=input_fn, workers=args.workers)