The implementation for paper keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM
- Environment configuration
git clone https://github.com/NoviceStone/Keqing.git
cd Keqing
conda create -n keqing python=3.11 -y
conda activate keqing
pip install 'litgpt[all]'
- Data download
Core idea: complex questions can be challenging to handle, while answering simple questions is a breeze.
LLMs are inherently endowed with powerful semantic understanding capabilities, offering us a preferred tool for parsing complex questions into simpler sub-questions. For KBQA, one would expect that each decomposed sub-question can be easily resolved with a single-hop inference over KG, yet this often requires some alignment between the LLM and the KG. Therefore, we resort to instruction fine-tuning to adapt the LLM to the structured knowledge in KG for better decomposition results.
Specifically, we opt for fine-tuning Llama2-7B using LoRA so that it is computationally affordable even with a single graphics card (like A6000 or RTX8000). To implement the fine-tuning, an easy-to-use project LitGPT is recommened. What we need to do is to prepare the corpus for instruction fine-tuning; below is an example to illustrate the required data format.
{
"instruction": "Parse the user input question to several subquestions: [{'question': subquestion, 'id': subquestion_id, 'dep': dependency_subquestion_id, 'seed_entity': seed_entity or <GENERATED>-dep_id}]...",
"input": "the actor of Kung Fu Panda also starred in which movies?",
"output": "[{'question': 'who acted in the movie [MASK]?', 'id': 0, 'dep': [-1], 'seed_entity': ['Kung Fu Panda']}, {'question': 'what films did [MASK] act in?', 'id': 1, 'dep': [0], 'seed_entity': ['<GENERATED>-0']}]"
}
Checkpoints: we provide the fine-tuned LoRA weights of Llama-2-7b on MetaQA, you can download and use it directly. But first you may need to download the base model weights for Llama-2-7b. Then we can merge these two weights into a complete fine-tuned model weights using LitGPT.
git clone https://github.com/Lightning-AI/litgpt.git
cd litgpt
# download the base model (llama-2-7b) weights
mkdir -p checkpoints/meta-llama
huggingface-cli download --resume-download meta-llama/Llama-2-7b-hf --local-dir ./checkpoints/meta-llama --token *****
litgpt merge_lora --checkpoint_dir ./finetuned_weights/decomposer/lora-llama-7b-MetaQA/final
To start an interactive Q&A experience with Keqing, you can execute the command
python chat.py --kb_filepath ./data/MetaQA/kb.txt --llama_checkpoint_dir ./finetuned_weights/decomposer/lora-llama-7b-MetaQA/final
If you find our work helpful in your research and use it, please cite the following work.
@article{wang2023keqing,
title={keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM},
author={Wang, Chaojie and Xu, Yishi and Peng, Zhong and Zhang, Chenxi and Chen, Bo and Wang, Xinrun and Feng, Lei and An, Bo},
journal={arXiv preprint arXiv:2401.00426},
year={2023}
}