Code for CS224N Project that builds on top of Zero-shot Text-to-SQL Learning with Auxiliary Task
Please use Python 3.6 and Pytorch 1.3. Other Python dependency is in requirement.txt. Install Python dependency with:
pip install -r requirements.txt
Data can be found from google drive. Please download them and extract them into root path.
Preprocessing is required. See run_tapas.sh
for parsing the data in different format than the base dual-task code. embeddings.py
can be used to generate embeddings (will require transformer library), and parse_embeddings.py
will parse generated embeddings and create save them as a torch tensor to disk to be consumed by the model as an embedding.
cd data_model/wikisql
python make_zs.py
python make_fs.py
cd zero-shot-text-to-SQL
./run.sh
The following code was written from scratch (in the zero-shot-text-to-SQL
directory):
similarity_study.py
Similarity_Analysis_And_Plots.ipynb
preprocess_bert.py
create_similarity_index.py
embeddings.py
parse_embeddings.py
The following code was changed significantly to implement my model architecture:
table/Models.py
(Implements embeddings, encoders, decoders, layers, etc.)table/ModelConstructor.py
(connects model components)table/Loss.py
table/IO.py
table/Trainer.py
train.py
evaluate.py
Various other smaller changes were required throughout the codebase.
- This implementation is based on coarse2fine.
- The preprocessing and evaluation code used for WikiSQL is from salesforce/WikiSQL.
- We build off of the model from Zero-shot Text-to-SQL Learning with Auxiliary Task
- We use the Transformers library by HuggingFace. In particulare, the BERT and TAPAS transformers.