/transition-amr-parser

Primary LanguagePythonApache License 2.0Apache-2.0

Transition-based AMR Parser

Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. The code includes two fundamental components.

  1. A State machine and oracle transforming the sequence-to-graph task into a sequence-to-sequence problem. This follows the AMR oracles in (Ballesteros and Al-Onaizan 2017) with improvements from (Naseem et al 2019) and (Fernandez Astudillo et al 2020)

  2. The stack-Transformer (Fernandez Astudillo et al 2020). A sequence to sequence model that also encodes stack and buffer state of the parser into its attention heads. It is also used in our works in self-learning (Lee et al 2020) and multi-linguality (Sheth et al 2021) in AMR.

Current version is 0.3.3 and yields 80.5 Smatch on the AMR2.0 test-set using the default stack-Transformer configuration. Aside from listed contributors, the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM.

Manual Installation

Clone the repository

git clone git@github.ibm.com:mnlp/transition-amr-parser.git
cd transition-amr-parser

The code has been tested on Python 3.6.9+. We use a script to activate conda/pyenv and virtual environments. If you prefer to handle this yourself just create an empty file (the training scripts will assume it exists in any case).

touch set_environment.sh

Then for pip only install do

. set_environment.sh
pip install -r scripts/stack-transformer/requirements.txt
bash scripts/download_and_patch_fairseq.sh
pip install --no-deps --editable fairseq-stack-transformer
pip install --editable .

Alternatively for a conda install do

. set_environment.sh
conda env update -f scripts/stack-transformer/environment.yml
pip install spacy==2.2.3 smatch==1.0.4 ipdb
bash scripts/download_and_patch_fairseq.sh
pip install --no-deps --editable fairseq-stack-transformer
pip install --editable .

If you are installing in PowerPCs, you will have to use the conda option. Also spacy has to be installed with conda instead of pip (2.2.3 version will not be available, which affects the lematizer behaviour)

To check if install worked do

. set_environment.sh
python tests/correctly_installed.py

As a further check, you can do a mini test with 25 annotated sentences that we provide under DATA/, you can use this

bash tests/minimal_test.sh

This runs a full train test excluding alignment and should take around a minute. Note that the model will not be able to learn from only 25 sentences.

The AMR aligner uses additional tools that can be donwloaded and installed with

bash preprocess/install_alignment_tools.sh

Training a model

You first need to preprocess and align the data. For AMR2.0 do

. set_environment.sh
python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR/corpora/amr2.0/

The same for AMR1.0

python preprocess/merge_files.py /path/to/LDC2014T12/data/amrs/split/ DATA/AMR/corpora/amr1.0/

You will also need to unzip the precomputed BLINK cache

unzip linkcache.zip

Then just call a config to carry a desired experiment

bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh

To display the results use

python scripts/stack-transformer/rank_results.py --seed-average

Note that there is cluster version of this script, currently only supporting LSF but easily adaptable to e.g. Slurm

Decode with Pre-trained model

To use from the command line with a trained model do

amr-parse \
  --in-checkpoint $in_checkpoint \
  --in-tokenized-sentences $input_file \
  --out-amr file.amr

It will parse each line of $input_file separately (assumed tokenized). $in_checkpoint is the pytorch checkpoint of a trained model. The file.amr will contain the PENMAN notation AMR with additional alignment information as comments.

To use from other Python code with a trained model do

from transition_amr_parser.stack_transformer_amr_parser import AMRParser
parser = AMRParser.from_checkpoint(in_checkpoint) 
annotations = parser.parse_sentences([['The', 'boy', 'travels'], ['He', 'visits', 'places']])
print(annotations.toJAMRString())