/Awesome-LLM-based-Text2SQL

[TKDE2025] Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | A curated list of resources (surveys, papers, benchmarks, and opensource projects) on large language model-based text-to-SQL.

MIT LicenseMIT

Awesome-LLM-based-Text2SQL

This repository provides a comprehensive collection of research papers, benchmarks, and open-source projects on large language model-based text-to-SQL (LLM-based Text-to-SQL). It includes all the contents from our survey paper "Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL" and will be continuously updated to incorporate the up-to-date advances and notable contributions from the text-to-SQL community. Stay tuned!!

πŸ€— You are vey welcome to contribute to this repository by launching an issue or a pull request. If you find any missing resources or come across interesting new research works, please don’t hesitate to open an issue or submit a PR!

πŸ“« Contact us via emails: zijin[dot]hong[at]connect[dot]polyu[dot]hk

πŸ“ƒ Please cite our paper if you find our survey or repository helpful!

πŸ”₯ News

  • [2025-09-14] πŸ”₯πŸ”₯ Repository launched based on our survey paper to keep track of recent progress in LLM-based text-to-SQL.
  • [2025-09-02] πŸŽ‰πŸŽ‰ Our paper "Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL" has been accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)!
  • [2025-05-01] πŸŽ‰πŸŽ‰ Our paper "Struture-Guided Large Language Models for Text-to-SQL Generation" has been accepted by International Conference of Machine Learning (ICML)!

Overview of LLM-based Text-to-SQL Workflow

A user asks a question about football leagues. The LLM takes this question together with the schema of the corresponding database as input and generates an SQL query as output. The generated SQL is then executed on the database, retrieving the result "The 5 leagues with the highest matches", which answers the user's question.

πŸ“œ Catalog

Awesome-LLM-based-Text2SQL


πŸ“ˆ Trends

A Sketch of Research Trends in the Field of Text-to-SQL with Representative Works

Before 2023, the focus is on a selection of representative traditional studies. However, from 2023 onward, the emphasis shifts to the rapid advancements driven by LLMs, marking a significant acceleration in the field.

πŸ“° Surveys

  • TKDE Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [Paper] [Code]
  • CSUR2025 A Survey on Employing Large Language Models for Text-to-SQL Tasks [Paper]
  • TKDE A Survey of Text-to-SQL in the Era of LLMs: Where are We, and Where are We Going? [Paper]
  • TKDE Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey [Paper]
  • arXiv2024 Large Language Model Enhanced Text-to-SQL Generation: A Survey [Paper]
  • VLDBJ2023 A Survey on Deep Learning Approaches for Text-to-SQL [Paper]
  • VLDB2023 Natural Language Interfaces for Databases with Deep Learning [Paper]
  • arXiv2022 A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [Paper]
  • COLING2022 Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect [Paper]

πŸ† Benchmarks

In the era of LLMs, two benchmarks and their variants/extensions are widely recognized for evaluating text-to-SQL capabilities. We will continually update the top five methods on each benchmark to showcase the latest advances in the text-to-SQL community. These benchmarks, along with other text-to-SQL dataset papers, are listed in the datasets section below.

BIRD - A Big Bench for Large-Scale Database Grounded Text-to-SQL

Method/Model Dev EX (%) Test EX (%) Paper/Code Date
Proprietary LongData-SQL 74.32 77.53 [Proprietary] 2025-07-14
arXiv2025 AskData + GPT-4o 75.36 77.14 [Paper] 2025-03-11
ICLR2025 CHASE-SQL + Gemini 74.90 76.02 [Paper] 2025-04-16
Proprietary TCDataAgent-SQL 74.12 75.74 [Report] 2025-05-30
Proprietary Contextual-SQL 73.50 75.63 [Report] [Code] 2025-02-27
arXiv2024 XiYan-SQL 73.34 75.63 [Paper] [Code] 2024-12-17

Spider1.0 - Semantic Parsing and Text-to-SQL Challenge

Method/Model Dev EX (%) Test EX (%) Paper/Code Date
Proprietary MiniSeek - 91.2 [Report] 2023-11-02
VLDB2024 DAIL-SQL + GPT-4 82.4 86.6 [Paper] [Code] 2023-08-20
NeurIPS2025 DIN-SQL + GPT-4 74.2 85.3 [Paper] [Code] 2023-04-21
arXiv2023 C3 + ChatGPT 81.8 82.3 [Paper] [Code] 2023-06-01
AAAI2025 RESDSQL-3B + NatSQL 84.1 79.9 [Paper] [Code] 2023-02-27

Spider2.0 - Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Method/Model Snow Score Lite Score Paper/Code Date
arXiv2025 AgenticData + Qwen3 - 44.5 [Paper] 2025-08-07
ICLR2025 ReFoRCE + o3 37.11 37.84 [Paper] [Code] 2025-05-22
arXiv2024 RSL-SQL + o3 - 33.09 [Paper] [Code] 2025-07-10
EMNLP2025 LinkAlign + DeepSeek-R1 - 33.09 [Paper] [Code] 2025-04-27
ICLR2025 Spider-Agent + Claude-3.7-Sonnet - 28.52 [Paper] [Code] 2025-03-16

BIRD-CRITIC - Can LLMs Fix User Issues in Real-World Database Applications?

Model SR (%) Date
ByteBrain-Agent 43.33 2025-06-10
GPT-5-High 34.96 2025-09-04
grok-4 33.68 2025-07-18
DeepSeek-R1 33.51 2025-04-20
o3-Mini 33.33 2025-04-20

BIRD-INTERACT - Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions

Model/Method Reward Date
Gemini-2.5-Pro 20.92 2025-08-22
o3-Mini 20.27 2025-08-22
Claude-Sonnet-4 18.35 2025-08-22
Qwen-3-Coder-480B 17.75 2025-08-22
DeepSeek-V3 15.15 2025-08-22

πŸ—ƒοΈ Datasets

We categorize the datasets into Original Datasets and Post-annotated Datasets based on whether they were released with the original dataset (question–SQL pairs) and databases, or were developed by adapting existing datasets and databases with special settings. The Post-annotated Datasets rely on the databases from Spider 1.0. For each original dataset, we list its characteristics, number of examples, and number of databases under the dataset title.

Original Datasets

  • arXiv2025 BIRD-CRITIC | SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications [Paper] [Code] [Dataset]
    Knowledge-augmented, Long-context; #Example: 600; #DB: 95
  • ICLR2025 Spider2.0 | Spider 2.0: Evaluating Language Models on Real-world Enterprise Text-to-SQL Workflows [Paper] [Code] [Dataset]
    Knowledge-augmented, Long-context; #Example: 632; #DB: 213
  • SIGMOD2025 BULL | FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis [Paper] [Code] [Dataset]
    Knowledge-augmented, Long-context; #Example: 4,966; #DB: 3
  • NeurIPS2023 BIRD | Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [Paper] [Code] [Dataset]
    Cross-domain, Knowledge-augmented; #Example: 12,751; #DB: 95
  • ACL2021 KaggleDBQA | KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers [Paper] [Code] [Dataset]
    Cross-domain; #Example: 272; #DB: 8
  • EMNLP2020 DuSQL | DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset [Paper] [Dataset]
    Cross-domain, Cross-lingual; #Example: 23,797; #DB: 200
  • Findings2020 SQUALL | On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries [Paper] [Code]
    Cross-domain, Cross-lingual; #Example: 11,468; #DB: 1,679
  • EMNLP2019 CoSQL | CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases [Paper] [Code] [Dataset]
    Cross-domain, Context-dependent; #Example: 15,598; #DB: 200
  • EMNLP2018 Spider | Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [Paper] [Code] [Dataset]
    Cross-domain; #Example: 10,181; #DB: 200
  • arXiv2017 WikiSQL | Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning [Paper] [Code] [Dataset]
    Cross-domain; #Example: 80,654; #DB: 26,521

Post-annotated Datasets

  • ICLR2023 Dr. Spider | Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness [Paper] [Code]
    Robustness; Perturbations in DB, query and SQL
  • ACL2022 ADVETA | Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation [Paper] [Code] [Dataset]
    Robustness; Adversarial table perturbation
  • Findings2022 Spider-SS&CG | Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment [Paper] [Code] [Dataset]
    Context-dependent; Splitting example into sub-examples
  • EMNLP2021 Spider-DK | Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization [Paper] [Code]
    Knowledge-augmented; Adding domain knowledge
  • ACL2021 Spider-SYN | Towards Robustness of Text-to-SQL Models against Synonym Substitution [Paper] [Code]
    Knowledge-augmented; Adding domain knowledge
  • Findings2020 Spider-Vietnames | A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese [Paper] [Code]
    Cross-lingual; Vietnamese version of Spider
  • NAACL2021 Spider-Realistic | Structure-Grounded Pretraining for Text-to-SQL [Paper] [Dataset]
    Robustness; Removing column names in question
  • EMNLP2019 CSpider | A Pilot Study for Chinese SQL Semantic Parsing [Paper] [Code]
    Cross-lingual; Chinese version of Spider
  • EMNLP2019 SParC | SParC: Cross-Domain Semantic Parsing in Context [Paper] [Code] [Dataset]
    Context-dependent; Annotate conversational contents

πŸͺ΄ Taxonomy

The implementation of recent LLM-based text-to-SQL methods primarily relies on in-context learning and fine-tuning, enabled by the release of both powerful proprietary and well-architected open-source LLMs. A detailed categorization of text-to-SQL methods can be found in our paper, and subsequent latest research papers will be continually updated and aligned with this taxonomy.

In-context Learning

  • EMNLP2025 LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL [Paper] [Code]
  • ICLR2025 ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration [Paper] [Code]

Fine-tuning

  • ACL2025 SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL [Paper] [Code]
  • ICLR2025 ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL [Paper] [Code]

πŸ“ƒ Citation

@article{hong2025next,
  title={Next-generation database interfaces: A survey of llm-based text-to-sql},
  author={Hong, Zijin and Yuan, Zheng and Zhang, Qinggang and Chen, Hao and Dong, Junnan and Huang, Feiran and Huang, Xiao},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2025}
}