/sustain-seq2seq

This repo is a playground for seq2seq models with PyTorch

Primary LanguageRoffApache License 2.0Apache-2.0

sustain-seq2seq

  • Tokenization that covers BPE and GPT2 (from pytorch_transformers) in a single Lookup object. Full tests for this are required as a lot of problems came from mismatched maps and out of range ints in the lookup.

    Encoding for BPE:

    • X and y are bordered with <BOS and <EOS> and <PAD>ed in rest

    Encoding for GPT2:

    • X and y are both <|endoftext|> ints (both bos/eos point to this string) and <PAD>ed in rest (decoder should stop if <|endoftext|> is generated at index>1)

Models that need to work:

  • LSTMEncoder + LSTMDecoder with Attention
  • GPT2Encoder + LSTMDecoder with Attention
  • LSTMEncoder + LSTMDecoder with Attention, Pointer Generator & Coverage
  • GPT2Encoder + LSTMDecoder with Attention, Pointer Generator & Coverage
  • GPT2Encoder + GPT2Decoder with Pointer Generator & Coverage

Other stuff that needs to be done:

  • Look at validation measures again (BLEU, METEOR, ROUGE)
  • Implement all attention types (low priority)
  • Experiment with multihead attention for RNNs
  • Beamsearch and/or topk/topp as in pytorch_transformers
  • Check attention masks are working everywhere
  • Optimizer: Learning rate scheduler, superconvergence, warm restart si cyclical LR. Implement scheduler. Partially done, needs more testing.