AutoSurvey (NeurIPS 2024)

AutoSurvey: Large Language Models Can Automatically Write Surveys

Yidong Wang^1,2∗, Qi Guo^2,3∗, Wenjin Yao², Hongbo Zhang¹, Xin Zhang⁴, Zhen Wu³, Meishan Zhang⁴, Xinyu Dai³, Min Zhang⁴, Qingsong Wen⁵, Wei Ye^2†, Shikun Zhang^2†, Yue Zhang^1†

¹Westlake University, ²Peking University, ³Nanjing University, ⁴Harbin Institute of Technology, Shenzhen, ⁵Squirrel AI

Introduction

AutoSurvey is a speedy and well-organized framework for automating the creation of comprehensive literature surveys.

Extensive experimental results across different survey lengths (8k, 16k, 32k, and 64k tokens) demon- strate that AutoSurvey consistently achieves high citation and content quality scores

Web Demo

You can also access our web demoto generate surveys.

News:

Generation based on personalized user requirements are supported (optional)!!

Requirements

Python 3.10.x
Required Python packages listed in requirements.txt

Installation

Clone the repository:

git clone https://github.com/AutoSurveys/AutoSurvey.git
cd AutoSurvey

Install the required packages:
```
pip install -r requirements.txt
```
Download the database: (Here we provide a database containing 530,000 arXiv paper abstracts and all papers are under the CS category. You can contact us to obtain the database containing the full content of the papers. ) https://1drv.ms/u/c/8761b6d10f143944/EaqWZ4_YMLJIjGsEB_qtoHsBoExJ8bdppyBc1uxgijfZBw?e=2EIzti
```
unzip database.zip -d ./database/
```

Usage

Generation

Here is an example command to generate survey on the topic "LLMs for education":

python main.py --topic "LLMs for education" 
               --gpu 0
               --saving_path ./output/
               --model gpt-4o-2024-05-13
               --section_num 7
               --subsection_len 700
               --rag_num 60
               --outline_reference_num 1500
               --db_path ./database
               --embedding_model nomic-ai/nomic-embed-text-v1
               --api_url https://api.openai.com/v1/chat/completions
               --api_key sk-xxxxxx

The generated content will be saved in the ./output/ directory.

--gpu: Specify the GPU to use.
--saving_path: Directory to save the output survey.
--model: Model to use.
--topic: Topic to generate content for.
--section_num: Number of sections in the outline.
--subsection_len: Length of each subsection.
--rag_num: Number of references to use for RAG.
--outline_reference_num: Number of references for outline generation.
--db_path: Directory of the database.
--embedding_model: Embedding model for retrieval.
--api_key: API key for the model.
--api_url: url for API request.

Evaluation

Here is an example command to evaluate the generated survey on the topic "LLMs for education":

python evaluation.py --topic "LLMs for education" 
               --gpu 0
               --saving_path ./output/
               --model gpt-4o-2024-05-13
               --db_path ./database
               --embedding_model nomic-ai/nomic-embed-text-v1
               --api_url https://api.openai.com/v1/chat/completions
               --api_key sk-xxxxxx

Make sure the generated survey is in the ./output/ directory

The evaluation result will be saved in the ./output/ directory.

--gpu: Specify the GPU to use (default: '0').
--saving_path: Directory to save the evaluation results (default: './output/').
--model: Model for evaluation.
--topic: Topic of generated survey.
--db_path: Directory of the database.
--embedding_model: Embedding model for retrieval.
--api_key: API key for the model.
--api_url: url for API request.

Citing Autosurvey

Please cite us if you find this project helpful for your project/paper:

@inproceedings{
2024autosurvey,
title={AutoSurvey: Large Language Models Can Automatically Write Surveys},
author = {Wang, Yidong and Guo, Qi and Yao, Wenjin and Zhang, Hongbo and Zhang, Xin and Wu, Zhen and Zhang, Meishan and Dai, Xinyu and Zhang, Min and Wen, Qingsong and Ye, Wei and Zhang, Shikun and Zhang, Yue},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}

Contributing

Contributions are welcome! Please open an issue to discuss what you would like to change.

License

This project is licensed under the MIT License.