typesql: A Python repository from tcqiuyu

TypeSQL

Source code accompanying our NAACL 2018 paper:TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation

Environment Setup

The code uses Python 2.7 and Pytorch 0.2.0 GPU.
Install Python dependency: pip install -r requirements.txt

Download Data and Embeddings

Download the zip data file at the Google Drive, and put it in the root dir.
Download the pretrained Glove and the paraphrase embedding para-nmt-50m/data/paragram_sl999_czeng.txt. Put the unziped glove and para-nmt-50m folders in the root dir.

Train Models

To use knowledge graph types:

  mkdir saved_model_kg
  python train.py --sd saved_model_kg

To use DB content types:

   mkdir saved_model_con
   python train.py --sd saved_model_con --db_content 1

Test Models

Test Model with knowledge graph types:

python test.py --sd saved_model_kg

Test Model with knowledge graph types:

python test.py --sd saved_model_con --db_content 1

Get Data Types

Get a Google Knowledge Graph Search API Key by following the link
Search knowledge graph to get entities:

python get_kg_entities.py [Google freebase API Key] [input json file] [output json file]

Use detected knowledge graph entites and DB content to group questions and create type attributes in data files:

python data_process_test.py --tok [output json file generated at step 2] --table TABLE_FILE --out OUTPUT_FILE [--data_dir DATA_DIRECTORY] [--out_dir OUTPUT_DIRECTORY]

python data_process_train_dev.py --tok [output json file generated at step 2] --table TABLE_FILE --out OUTPUT_FILE [--data_dir DATA_DIRECTORY] [--out_dir OUTPUT_DIRECTORY]

Acknowledgement

The implementation is based on SQLNet. Please cite it too if you use this code.

tcqiuyu/typesql

TypeSQL

Environment Setup

Download Data and Embeddings

Train Models

Test Models

Get Data Types

Acknowledgement