/X-Gear

Code for our ACL-2022 paper "Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction".

Primary LanguagePython

X-Gear: Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction

Code for our ACL-2022 paper Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction.

Setup

  • Python=3.7.10
$ conda env create -f environment.yml

Data and Preprocessing

  • Go into the folder ./preprocessing/
  • If you follow the instruction in the README.md, then you can get your data in the folder ./processed_data/

Training

  • Run ./scripts/generate_data_ace05.sh and ./scripts/generate_data_ere.sh to generate training examples of different languages for X-Gear. The generated training data will be saved in ./finetuned_data/.

  • Run ./scripts/train_ace05.sh or ./scripts/train_ere.sh to train X-Gear. Alternatively, you can run the following command.

    python ./xgear/train.py -c ./config/config_ace05_mT5copy-base_en.json
    

    This trains X-Gear with mT5-base + copy mechanisim for ACE-05 English. The model will be saved in ./output/. You can modify the arguments in the config file or replace the config file with other files in ./config/.

Evaluating

  • Run the following script to evaluate the performance for ACE-05 English, Arabic, and Chinese.

    ./scripts/eval_ace05.sh [model_path] [prediction_dir]
    

    If you want to test X-Gear with mT5-large, remember to modify the config file in ./scripts/eval_ace05.sh.

  • Run the following script to evaluate the performance for ERE English and Spanish.

    ./scripts/eval_ere.sh [model_path] [prediction_dir]
    

    If you want to test X-Gear with mT5-large, remember to modify the config file in ./scripts/eval_ere.sh.

We provide our pre-trained models and show their performances as follows.

ACE-05

en Arg-I en Arg-C ar Arg-I ar Arg-C zh Arg-I zh Arg-C
X-Gear-ace05-mT5-base+copy-en 73.39 69.28 47.64 42.09 57.81 54.46
X-Gear-ace05-mT5-base+copy-ar 33.87 27.17 72.97 66.92 31.14 28.84
X-Gear-ace05-mT5-base+copy-zh 59.85 55.15 38.04 34.88 72.93 68.99
X-Gear-ace05-mT5-large+copy-en 75.16 71.85 54.18 50.00 63.14 58.40
X-Gear-ace05-mT5-large+copy-ar 38.81 34.57 73.49 67.75 39.26 36.13
X-Gear-ace05-mT5-large+copy-zh 61.44 55.40 38.71 36.14 70.45 66.99

ERE

en Arg-I en Arg-C es Arg-I es Arg-C
X-Gear-ere-mT5-base+copy-en 78.26 71.55 64.31 58.70
X-Gear-ere-mT5-base+copy-es 69.21 59.79 70.67 66.37
X-Gear-ere-mT5-large+copy-en 78.10 73.04 64.82 60.35
X-Gear-ere-mT5-large+copy-es 69.03 63.73 71.47 68.49

Citation

If you find that the code is useful in your research, please consider citing our paper.

@inproceedings{acl2022xgear,
    author    = {Kuan-Hao Huang and I-Hung Hsu and Premkumar Natarajan and Kai-Wei Chang and Nanyun Peng},
    title     = {Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction},
    booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year      = {2022},
}