The primary objective of this project is to identify and finetune various existing approaches for Text-to-SQL conversion which accurately convert user-input natural language questions into corresponding SQL queries. This involves understanding the nuances of natural language, recognizing intent and context, and translating these into structured query language format. The project aims to make this translation as seamless and accurate as possible, catering to a wide range of queries from simple to complex. In order to determine which models work better, we intend to compare the performance of all the existing approaches.
Various datasets are used for training/finetuning/inference. These datasets can be found below:
- WikiSQL: Available at https://github.com/salesforce/WikiSQL
- Spider: Available at https://github.com/taoyds/spider
- sql-create-context: Available at https://huggingface.co/datasets/b-mc2/sql-create-context
We experimented with the following models for out text-to-sql conversion project. Seq2SQl and SQLNet served as our baselines. All the models mentioned below use one of the datasets metioned above for the training purpose. The steps to run these models and the current implementation of these approaches can be found at their respective folders inside /models in this repo.
- Baseline Models
- Seq2SQL
- SQLNet
- C3-SQL
- DIN-SQL
- DAIL-SQL
- Transformer-based models from Hugging face
- gpt2Medium_text_to_sql
- t5-small-awesome-text-to-sql
- Mistral-7B-SQL
- CodeLlama-7b-Instruct-SQL
- CodeLlama-13b-Instruct-SQL
- squeal
- T5-LM-Large-text2sql-spider
This project's user interface (UI) is a Flask web application that converts natural language queries to SQL utilizing a variety of models. Users can enter their query, select from a variety of databases and models, and get the matching SQL conversion.
The following table compares the execution accuracy and the exact match for 3 different models which are currently among the state-of-the art methods in the area of text-to-sql conversion.
Model | Execution Accuracy w/o plugging | Execution Accuracy w/ plugging | Exact Match Accuracy w/o plugging | Exact Match Accuracy w/ plugging |
---|---|---|---|---|
C3-SQL | 89.0% | 90.0% | 40.0% | 40.0% |
DIN-SQL | 93.7% | 95.2% | 63.5% | 63.5% |
DAIL-SQL | 76.6% | 82.6% | 58.7% | 58.7% |
In this project, we conduct a comparative study of the different models for the task of text-to-SQL conversion. We started with baseline models like Seq2SQL and SQLNET and implemented those. Further, we implemented the top 3 models for the Spider dataset namely C3SQL, DINSQL, and DAILSQL. Additionally, we implemented some hugging-face models for this task which would perform the task of converting the models further. Although there were multiple challenges along the way, we powered through it all and attempted to perform the task of test-to-SQL conversion. Lastly, we made a User Interface where all the models come together to provide the SQL query generation of the corresponding natural language query.