/MTLB-STRUCT

The MWE identification system, MTLB-STRUCT, participated in the PARSEME 1.2 Shared Task on semi-supervised identification of verbal multiword expressions.

Primary LanguagePython

Overview

This repository contains the code for the system MTLB-STRUCT participated in the PARSEME 1.2 shaed task on semi-supervised identification of verbal MWEs. The system is based on pre-trained BERT masked language modelling and jointly learns VMWE tags and dependency parse trees. The system ranked first in the open track of the shared task.

Project Structure

.
├── README.md
├── code
│   ├── berteval.py
│   ├── config
│   ├── corpus.py
│   ├── corpus_reader.py
│   ├── evaluation.py
│   ├── load_test.py
│   ├── main.py
│   ├── model.py
│   └──  preprocessing.py
└── requirements.txt

The requirements as listed in requirements.txt are:

How to Run the System

  1. Copy the data files from PARSEME 1.2 repository under the directory data, in the path data/1.2/{language}.
  2. Choose the configuration file from the /code/config/ directory or make your own config file with the same fileds as in the files in the config directory.
  3. Run main.py config/{config.json}

This performs training the model based on the config file you passed as the argument. As a result of this, the trained model will be saved in a directory called saved and then it can be used for testing the model.

You can get the predictions on dev/test data by running load_test.py [PATH TO THE DIRECTORY OF SAVED MODEL]. This saves the prediction .cupt file in the saved directory.

Note that the evaluation performance results that you see after running load_test.py for development sets, are based on seqeval NER metrics and not the PARSEME evaluation measures. We evaluate the performance of our predictions using PARSEME evaluation script evaluate.py.

Reference

@article{Taslimipoor2020,
  author    = {Shiva Taslimipoor and
               Sara Bahaadini and
               Ekaterina Kochmar},
  title     = {MTLB-STRUCT @PARSEME 2020: Capturing Unseen Multiword Expressions 
               Using Multi-task Learning and Pre-trained Masked Language Models},
  year      = {2020},
  eprint={2011.02541},
  archivePrefix={arXiv},
  url       = {}
}