In this repository we maintain the code used in the paper Visconde: Multi-document QA with GPT-3 and Neural Reranking, submited to European Conference on Information Retrieval ECIR2023.
Abstract: This paper discusses a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents. The system, called Visconde, uses a three-step pipeline to perform the task: decompose, retrieve, and aggregate. The first step decomposes the question into simpler questions using a few-shot large language model (LLM). Then, a state-of-the-art search engine is used to retrieve candidate passages from a large collection for each decomposed question. In the final step, we use the LLM in a few-shot setting to aggregate the contents of the passages into the final answer. The system is evaluated on three datasets: IIRC, Qasper, and StrategyQA. Results suggest that current retrievers are the main bottleneck and that readers are already performing at the human level as long as relevant passages are provided. The system is also shown to be more effective when the model is induced to give explanations before.
We evaluated our proposal on three datasets: IIRC, QASPER and StrategyQA.
Download datasets
sh setup.sh
- Decompose test questions
- Create Indices
- Create list for reranking
- Rerank items (GPU required)
- Generate explanation for training examples
- Testing
- Rerank paragraphs by question
- Generate explanations for training examples
- Testing
- Compute metrics For computing metrics download run:
python qasper_evaluator.py --predictions PREDICTIONS_FILE --gold data/qasper-test-v0.3.json --text_evidence_only
- Create indices
- Decompose questions
- Create lists for reranking
- Rerank items (GPU required)
- Testing
- Compute metrics For computing metrics clone this repository and run the evaluator.