
Multi-Task Deep Neural Networks for Natural Language Understanding

Primary LanguagePython

Multi-Task Deep Neural Networks for Natural Language Understanding

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:

Xiaodong Liu*, Pengcheng He*, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
arXiv version
*: Equal contribution


Setup Environment

Install via pip:

  1. python3.6

  2. install requirements
    > pip install -r requirements.txt

Use docker:

  1. Pull docker
    > docker pull allenlao/pytorch-mt-dnn:v0.1

  2. Run docker
    > docker run -it --rm --runtime nvidia allenlao/pytorch-mt-dnn:v0.1 bash
    Please refere the following link if you first use docker: https://docs.docker.com/

Train a toy MT-DNN model

  1. Download data
    > sh download.sh
    Please refer to download GLUE dataset: https://gluebenchmark.com/

  2. Preprocess data
    > python prepro.py

  3. Training
    > python train.py

Note that we ran experiments on 4 V100 GPUs for base MT-DNN models. You may need to reduce batch size for other GPUs.

GLUE Result reproduce

  1. MTL refinement: refine MT-DNN (shared layers), initialized with the pre-trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation.
    Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32.

    • Preprocess GLUE data via the aforementioned script
    • Training:
  2. Finetuning: finetune MT-DNN to each of the GLUE tasks to get task-specific models.
    Here, we provide two examples, STS-B and RTE. You can use similar scripts to finetune all the GLUE tasks.

    • Finetune on the STS-B task
      > scripts\run_stsb.sh
      You should get about 90.5/90.4 on STS-B dev in terms of Pearson/Spearman correlation.
    • Finetune on the RTE task
      > scripts\run_rte.sh
      You should get about 83.8 on RTE dev in terms of accuracy.

SciTail & SNIL Result reproduce (Domain Adaptation)

  1. Domain Adaptation on SciTail

  2. Domain Adaptation on SNLI

Notes and Acknowledgments

BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT: https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc

How do I cite MT-DNN?

For now, please cite arXiv version:

  title={Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1901.11504},

and a new version of the paper will be shared later. 

Typo: there is no activation fuction in Equation 2.

Contact Information

For help or issues using MT-DNN, please submit a GitHub issue.

For personal communication related to MT-DNN, please contact Xiaodong Liu (xiaodl@microsoft.com), Pengcheng He (penhe@microsoft.com), Weizhu Chen (wzchen@microsoft.com) or Jianfeng Gao (jfgao@microsoft.com).