This is the official repository for our EMNLP 2023 paper: Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
We propose a novel framework, Tree of Clarifications (ToC) designed for generating long-form answers to ambiguous questions.
- It guides LLMs to explore diverse interpretations of ambiguity in a tree structure with the ability to prune unhelpful ones
- We investigate combining retrieval-augmented generation (RAG) with LLM and achieve the state-of-the-art performance on ASQA
To facilitate a smooth setup, we suggest creating a Conda environment using the provided configuration::
conda env create -f environment.yml
Activate the newly created environment with:
conda activate toc
Access and download the ASQA dataset here or utilize the pre-packaged version in our repository at ./asqa/ASQA.json
ToC is capable of incorporating search results from external sources, such as the Bing search engine, to enhance answer quality. Follow the script below to fetch search results, or use our pre-compiled dataset at ./bing/results.json. Omitting this step is an option but may slightly impact ToC's performance.
Set your Bing API credentials:
export BING_SUBSCRIPTION_KEY= # your Bing API key here
export BING_SEARCH_URL= # your Bing search URL here
Please refer to the tutorial for detailed information about setting up your subscription.
Set the directory paths for the ASQA dataset and Bing search results. Run the following script to search Wikipedia documents relevant to ambiguous questions and save the results in $BING_DIR
.
export ASQA_DIR= # directory path to the ASQA dataset
export BING_DIR= # directory path to Bing search results
python bing_search.py \
--data_dir $ASQA_DIR \
--output_dir $BING_DIR
python get_wiki.py \
--data_dir $BING_DIR \
--output_dir "top100" \
--top_k 100 \
Before running ToC, you need to specify the following. Fill openAI API key by referring to the homepage and specify colbert server url. We utilized the server hosted by DSPy. Please note that the hosting server may change. For setting up your server, refer to the instructions here
export OPENAI_KEY= # your OpenAI API key here
export COLBERT_URL= 'http://ec2-44-228-128-229.us-west-2.compute.amazonaws.com:8893/api/search'
To run ToC, use the following script, specifying the necessary paths and options:
export ASQA_DIR= # directory path to the ASQA dataset
export OUT_DIR= # directory path to results
python run_toc.py \
--data_dir $ASQA_DIR \
--bing_path $BING_PATH \ # Optional
--openai_key $OPENAI_KEY \
--colbert_url $COLBERT_URL \
--verify \
--output_dir $OUT_DIR \
${ARGS}
To evaluate the answers generated by ToC, follow the guidelines provided in the official ASQA repository.
@article{kim2023tree,
title={Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models},
author={Gangwoo Kim and Sungdong Kim and Byeongguk Jeon and Joonsuk Park and Jaewoo Kang},
journal={EMNLP},
year={2023}
}