/Decomposition-for-Semantic-Parsing

Code of "Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing"

Primary LanguagePython

Decomposition-for-Semantic-Parsing

The code of our TACL paper Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing.

Dependencies

  • Python 3.7.1
  • nltk==3.4
  • numpy==1.18.5
  • torch==1.12.0
  • transformers==4.15.0
  • openai==0.18.1

Resources

Question Decomposition

You can directly use our decomposed results, or run the following scripts to decompose questions.

Complete prompts for ComplexWebQuestions and KQA are shown in ./prompt/[ComplexWebQuestions/KQA].

# For ComplexWebQuestions, use Codex to decompose 5,000 questions from training data, which are used to train a T5-decomposer.
# input_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase
# output_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.codex-decompose
python ./src/decomposing/ComplexWebQuestions/codex-decompose-for-trainingData.py
# For ComplexWebQuestions, use the 5,000 decomposed questions to train a T5-decomposer.
# And then, decompose all the training questions with the T5-decomposer.
# input_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.codex-decompose
# output_file: ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.T5-decompose
bash ./scripts/decomposing/run_T5_Question2Decom.sh
# For ComplexWebQuestions, use Codex to decompose the questions from development/testing sets, which are used as the input of semantic parser.
# input_file: ./data/ComplexWebQuestions/ComplexWebQuestions_[dev/test].json.pkl
# output_file: ./data/ComplexWebQuestions/ComplexWebQuestions_[dev/test].json.pkl.codex-decompose
python ./src/decomposing/ComplexWebQuestions/codex-decompose-for-validationData.py

The processing pipeline of KQA is similar to that of ComplexWebQuestions.

Semantic Parsing

# For ComplexWebQuestions, use Codex to decompose the questions from development/testing sets, which are used as the input of semantic parser.
# training set: synthetic questions from ./data/ComplexWebQuestions/ComplexWebQuestions_train.json.pkl.T5-paraphrase.T5-decompose
# development set: synthetic questions from ./data/ComplexWebQuestions/ComplexWebQuestions_dev.json.pkl.codex-decompose
# testing set: natural questions from ./data/ComplexWebQuestions/ComplexWebQuestions_test.json.pkl.codex-decompose
bash ./scripts/semantic-parsing/run_T5_Decom2Logic.sh