- An implementation for Benchmarking Large Language Models in Retrieval-Augmented Generation
conda create -n rgb python=3.10.0
conda activate rgb
bash env.sh
The data is putted in data/
data/
├── en.json
├── en_int.json
├── en_fact.json
├── zh.json
├── zh_int.json
└── zh_fact.json
For evaluating ChatGPT, you can run as:
python evalue.py \
--dataset en \
--modelname chatgpt \
--temp 0.2 \
--noise_rate 0.6 \
--api_key YourAPIKEY
For evaluating other models, you can run as:
python evalue.py \
--dataset en \
--modelname chatglm2-6b \
--temp 0.2 \
--noise_rate 0.6 \
--plm THUDM/chatglm-6b
You should change modelname
and plm
for different models, where plm
is the path of model.
temp
is the temperature of model.
noise_rate
is rate of noisy documents in inputs.