AutoSurvey (NeurIPS 2024)

AutoSurvey: Large Language Models Can Automatically Write Surveys


Yidong Wang1,2∗, Qi Guo2,3∗, Wenjin Yao2, Hongbo Zhang1, Xin Zhang4, Zhen Wu3, Meishan Zhang4, Xinyu Dai3, Min Zhang4, Qingsong Wen5, Wei Ye2†, Shikun Zhang2†, Yue Zhang1†


1Westlake University, 2Peking University, 3Nanjing University, 4Harbin Institute of Technology, Shenzhen, 5Squirrel AI

Introduction

AutoSurvey is a speedy and well-organized framework for automating the creation of comprehensive literature surveys.

Overview Extensive experimental results across different survey lengths (8k, 16k, 32k, and 64k tokens) demon- strate that AutoSurvey consistently achieves high citation and content quality scores Overview

Web Demo

You can also access our web demoto generate surveys.

News:

  1. Generation based on personalized user requirements are supported (optional)!!

Requirements

  • Python 3.10.x
  • Required Python packages listed in requirements.txt

Installation

  1. Clone the repository:

    git clone https://github.com/AutoSurveys/AutoSurvey.git
    cd AutoSurvey
  2. Install the required packages:

    pip install -r requirements.txt
  3. Download the database: (Here we provide a database containing 530,000 arXiv paper abstracts and all papers are under the CS category. You can contact us to obtain the database containing the full content of the papers. ) https://1drv.ms/u/c/8761b6d10f143944/EaqWZ4_YMLJIjGsEB_qtoHsBoExJ8bdppyBc1uxgijfZBw?e=2EIzti

    unzip database.zip -d ./database/

Usage

Generation

Here is an example command to generate survey on the topic "LLMs for education":

python main.py --topic "LLMs for education" 
               --gpu 0
               --saving_path ./output/
               --model gpt-4o-2024-05-13
               --section_num 7
               --subsection_len 700
               --rag_num 60
               --outline_reference_num 1500
               --db_path ./database
               --embedding_model nomic-ai/nomic-embed-text-v1
               --api_url https://api.openai.com/v1/chat/completions
               --api_key sk-xxxxxx 

The generated content will be saved in the ./output/ directory.

  • --gpu: Specify the GPU to use.
  • --saving_path: Directory to save the output survey.
  • --model: Model to use.
  • --topic: Topic to generate content for.
  • --section_num: Number of sections in the outline.
  • --subsection_len: Length of each subsection.
  • --rag_num: Number of references to use for RAG.
  • --outline_reference_num: Number of references for outline generation.
  • --db_path: Directory of the database.
  • --embedding_model: Embedding model for retrieval.
  • --api_key: API key for the model.
  • --api_url: url for API request.

Evaluation

Here is an example command to evaluate the generated survey on the topic "LLMs for education":

python evaluation.py --topic "LLMs for education" 
               --gpu 0
               --saving_path ./output/
               --model gpt-4o-2024-05-13
               --db_path ./database
               --embedding_model nomic-ai/nomic-embed-text-v1
               --api_url https://api.openai.com/v1/chat/completions
               --api_key sk-xxxxxx 

Make sure the generated survey is in the ./output/ directory

The evaluation result will be saved in the ./output/ directory.

  • --gpu: Specify the GPU to use (default: '0').
  • --saving_path: Directory to save the evaluation results (default: './output/').
  • --model: Model for evaluation.
  • --topic: Topic of generated survey.
  • --db_path: Directory of the database.
  • --embedding_model: Embedding model for retrieval.
  • --api_key: API key for the model.
  • --api_url: url for API request.

Citing Autosurvey

Please cite us if you find this project helpful for your project/paper:

@inproceedings{
2024autosurvey,
title={AutoSurvey: Large Language Models Can Automatically Write Surveys},
author = {Wang, Yidong and Guo, Qi and Yao, Wenjin and Zhang, Hongbo and Zhang, Xin and Wu, Zhen and Zhang, Meishan and Dai, Xinyu and Zhang, Min and Wen, Qingsong and Ye, Wei and Zhang, Shikun and Zhang, Yue},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}

Contributing

Contributions are welcome! Please open an issue to discuss what you would like to change.

License

This project is licensed under the MIT License.