The code of our TACL paper Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing.
- Python 3.7.1
- nltk==3.4
- numpy==1.18.5
- torch==1.12.0
- transformers==4.15.0
- openai==0.18.1
- Processed Data
- KQA
- ComplexWebQuestions
- OpenAI api_key
You can directly use our decomposed results, or run the following scripts to decompose questions.
Complete prompts for ComplexWebQuestions and KQA are shown in ./prompt/[ComplexWebQuestions/KQA].
# For ComplexWebQuestions, use Codex to decompose 5,000 questions from training data, which are used to train a T5-decomposer.
# input_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase
# output_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.codex-decompose
python ./src/decomposing/ComplexWebQuestions/codex-decompose-for-trainingData.py
# For ComplexWebQuestions, use the 5,000 decomposed questions to train a T5-decomposer.
# And then, decompose all the training questions with the T5-decomposer.
# input_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.codex-decompose
# output_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.T5-decompose
bash ./scripts/decomposing/run_T5_Question2Decom.sh
# For ComplexWebQuestions, use Codex to decompose the questions from development/testing sets, which are used as the input of semantic parser.
# input_file: ./data/ComplexWebQuestions/ComplexWebQuestions_[dev/test].json.pkl
# output_file: ./data/ComplexWebQuestions/ComplexWebQuestions_[dev/test].json.pkl.codex-decompose
python ./src/decomposing/ComplexWebQuestions/codex-decompose-for-validationData.py
The processing pipeline of KQA is similar to that of ComplexWebQuestions.
# For ComplexWebQuestions, use Codex to decompose the questions from development/testing sets, which are used as the input of semantic parser.
# training set: synthetic questions from ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.T5-decompose
# development set: synthetic questions from ./data/ComplexWebQuestions/ComplexWebQuestions_dev.json.pkl.codex-decompose
# testing set: natural questions from ./data/ComplexWebQuestions/ComplexWebQuestions_test.json.pkl.codex-decompose
bash ./scripts/semantic-parsing/run_T5_Decom2Logic.sh