This is the code for this video on Youtube by Siraj Raval. This repo provides an implementation of SQLNet and Seq2SQL neural networks for predicting SQL queries on WikiSQL dataset. The paper is available at here.
Xiaojun Xu, Chang Liu, Dawn Song. 2017. SQLNet: Generating Structured Queries from Natural Language Without Reinforcement Learning.
@article{xu2017sqlnet,
title={SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning},
author={Xu, Xiaojun and Liu, Chang and Song, Dawn},
journal={arXiv preprint arXiv:1711.04436},
year={2017}
}
The data is in data.tar.bz2
. Unzip the code by running
tar -xjvf data.tar.bz2
The code is written using PyTorch in Python 2.7. Check here to install PyTorch. You can install other dependency by running
pip install -r requirements.txt
Download the pretrained glove embedding from here using
bash download_glove.sh
Run the following command to process the pretrained glove embedding for training the word embedding:
python extract_vocab.py
The training script is train.py
. To see the detailed parameters for running:
python train.py -h
Some typical usage are listed as below:
Train a SQLNet model with column attention:
python train.py --ca
Train a SQLNet model with column attention and trainable embedding (requires pretraining without training embedding, i.e., executing the command above):
python train.py --ca --train_emb
Pretrain a Seq2SQL model on the re-splitted dataset
python train.py --baseline --dataset 1
Train a Seq2SQL model with Reinforcement Learning after pretraining
python train.py --baseline --dataset 1 --rl
The script for evaluation on the dev split and test split. The parameters for evaluation is roughly the same as the one used for training. For example, the commands for evaluating the models from above commands are:
Test a trained SQLNet model with column attention
python test.py --ca
Test a trained SQLNet model with column attention and trainable embedding:
python test.py --ca --train_emb
Test a trained Seq2SQL model withour RL on the re-splitted dataset
python test.py --baseline --dataset 1
Test a trained Seq2SQL model with Reinforcement learning
python test.py --baseline --dataset 1 --rl
Credits for this code go to xiaojunxu. I've merely created a wrapper to get people started.