/xlm_to_xlsr

Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)

Primary LanguagePythonMIT LicenseMIT

Distilling a Pretrained Language Model to a Multilingual ASR Model

plot

Oral presentation @ Interspeech

presentation

How to run experiments (Table 1)

Environments

Supported datasets

  • Check configs for supported datasets.
  • For example, if you want CommonVoice Czech, set $dataset as common_voice_czech.

From scratch

# If you change the # of GPUs, you have to fix per_device_train_batch_size in training config.
CUDA_VISIBLE_DEVICES=0,1 python3 train.py \
    +distill=random_init \
    +dataset=$dataset \
    +train=v1 \
    +xlsr=w2v2_xlsr

Fine-tuning

CUDA_VISIBLE_DEVICES=0,1 python3 train.py \
    +distill=vanilla \
    +dataset=$dataset \
    +train=v1 \
    +xlsr=w2v2_xlsr

Fine-tuning + Distill-L2S

# You have to set $lambda as the trade-off hyperparameter, i.e., 0.25, 0.5 or 1.0.
CUDA_VISIBLE_DEVICES=0,1 python3 train.py \
    +distill=shrink \
    +dataset=$dataset \
    +train=v1 \
    +xlsr=w2v2_xlsr \
    distill.feat_loss=$lambda