To Do

  • additional experimental conditions:
    • discourse markers / discourse + laughter
    • freezing the utterance encoder
    • in-domain pre-training for BERT
    • GloVe aggregation for utterances
      • BiLSTM
      • CNN / average pool?
  • methodological improvements:
    • use customized BERT vocab/word-piece tokenization for baseline models as well as BERT
  • additional corpora:
  • improve reporting and analysis:
    • macro F1 / macro precision? See: Guillou et. al., 2016 (thanks, Sharid!)
    • majority class baseline / tag distribution
    • time to train
    • number of parameters / task-trained parameters
  • not super exciting but maybe we should try:
    • DAR model hyperparameter tuning (hidden_size, n_layers, dropout, use_lstm)
    • play with learning rate
    • use the BERT Adam optimiser (implements a warm-up)
  • probably future work:
    • probing tasks of the hidden layer
      • predict dialogue end (or turns to end)
      • predict turn change
    • dialogue model pre-training
      • instead of training the dialogue model to predict DAs directly, predict the encoder representation of the next utterance (unsupervised)
      • test/probe by guessing DAs (or other discourse properties) with an additional linear layer