/conll2020-multilingual-sentence-probing

Code and data for our CoNLL 2020 publication: "How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation"

Primary LanguagePythonOtherNOASSERTION

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Code and data for our CoNLL 2020 publication: "How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation"

Citation

@inproceedings{eger-etal-2020-probe,
    title = "How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation",
    author = "Eger, Steffen  and
      Daxenberger, Johannes  and
      Gurevych, Iryna",
    booktitle = "Proceedings of the 24th Conference on Computational Natural Language Learning",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.conll-1.8",
    pages = "108--118",
}

Structure

Our implementation is based on Senteval to train and evaluate classifiers, using a given sentence embedding. The following features were added:

  • Change the size of a dataset, while maintaining its balance
  • Change the balance between classes in a given dataset
  • Use the Random Forest and Naive Bayes classifiers from scikit-learn
  • Automatically tune hyperparameters for MLP and Random Forest
  • Train various sentence embeddings

We also added english datasets, as well as datasets in the languages:

  • Turkish (tr)
  • Russian (ru)
  • Georgian (ka)

The following probing and downstream tasks were added to SentEval:

Task Type Description Example Command Line Argument
Voice Probing Whether sent. contains a passive construct He likes cats ⟶ False Voice
Subject Verb Agreement Probing Whether subject and verb agree They works together ⟶ Disagree SubjVerbAgreement
Subject Verb Distance Probing Distance between subject and verb The delivery was very late ⟶ 1 SubjVerbDistance
Argumentation Mining Downstream Whether sent. supports or opposes a given topic (abortion, Abortion is basically murder!) ⟶ opposing MArgMin
Sentiment Analysis Downstream Positive, negative or neutral sentiment Never fails to disappoint ⟶ NEG MSenti

Installation

(Optional): Set up a virtual environment

python3 -m venv venv
source venv/bin/activate

Install pip requirements for senteval and sentence embeddings

pip install -r requirements.txt

Setup sentence embeddings and download their checkpoints

Our pretrained multilingual Infersent, Quickthought and RandomLSTM embeddings (checkpoints) can be downloaded from here and placed in sentence_embeddings/embedder_data.

To download further embeddings:

cd sentence-embeddings
./download_requirements.sh

Download downstream tasks from senteval

cd senteval/data/downstream
./get_transfer_data.bash

How to run an experiment

The purpose of __main__.py is to generate sentence embeddings and automatically execute our modified version of SentEval on them. To run an experiment, specify a list of sentence embeddings and tasks, an output file and a classifier. You can specify additional parameters to modify the dataset before running experiments.

Example: python . -s avg -t WordContent -f results.json --mlp

The parameter s specifies the sentence embedding, t a list of tasks and f the result file. mlp sets the classifier to Multilayer Perceptron. For an exhaustive list of command line parameters, run python . --help.

If the result file already exists and contains valid json, the new results will be merged with the existing results. The results are written in the following json format:

{
  'Experiment 1 parameters': {
    'Sentence embedding 1': {
      'Task 1': {
        ...
      },
      'Task 2': {
        ...
      }
    }
  }
}

The file log.txt contains all parameters, results and the logs from senteval.

How to reproduce our results

The following table explains the values that we set the command line parameters to, when we executed our experiments. Afterwards we will give their individual shell commands.

Parameter Values Meaning
sentence_embeddings avg; pmean; randomLSTM; infersent; quickthought; LASER; averageMultilingualBERT Average Pooling; Power Means; Random LSTM; Infersent; QuickThoughts; LASER; mBERT with average pooling
ntrain 0.1; 0.5; 1
100000; 30000; 20000; 10000; 5000; 2000
10%; 50%; 100% of the training data
100000; 30000; 20000; 10000; 5000; 2000 samples
lang en; ru; tr; ka English; Russian; Turkish; Georgian
balance 1 5; 1 10 1:5; 1:10 relation between the class sizes
export embeddings=avg pmean randomLSTM infersent quickthought LASER averageMultilingualBERT

English Probing For each $classifier in mlp, log_reg, random_forest, naive_bayes and for each $ntrain in 100000, 30000, 20000, 10000, 5000, 2000:

python . --lang en -s $embeddings --$classifier --ntrain $ntrain -t Length WordContent Depth TopConstituents BigramShift SubjNumber SubjVerbAgreement SubjVerbDistance Voice -f results.json

Multilingual Probing For each $lang in tr, ru, ka (please note that not all probing tasks are avilable for all languages):

python . --lang $lang --log_reg --ntrain 10000 -t Length WordContent Depth TopConstituents BigramShift SubjNumber SubjVerbAgreement SubjVerbDistance Voice -f results.json

Multilingual Downstream For each $lang in tr, ru, ka:

python . --log_reg --lang $lang -t MArgMin MSenti MTREC
python . --log_reg --lang en -t MArgMin MSenti TREC