Official Repository for "Investigating Recurrent Transformers with Dynamic Halt" - Jishnu Ray Chowdhury, Cornelia Caragea

Credits:

Requirements

  • pytorch 1.10.0
  • pytorch-lightning 1.9.3
  • tqdm 4.62.3
  • tensorflow-datasets 4.5.2
  • typing_extensions 4.5.0
  • pykeops 2.1.1
  • jsonlines 2.0.0
  • einops 0.6.0
  • torchtext 0.8.1
  • flash-attn 2.1.1

Data Setup

Processing Data

  • Go to preprocess/ and run each preprocess files to preprocess the corresponding data

How to train

Train: python trian.py --model=[insert model name] -- dataset=[insert dataset name] --times=[insert total runs] --device=[insert device name] --model_type=[classifier/sentence_pair/sentence_pair2]

  • Check argparser.py for exact options.
  • sentence_pair (as model type) is used for sequence matching tasks (logical inference, AAN).
  • flipflop (as model type) is used for flipflop language modeling
  • classifier for the rest.
  • Generally we use total times as 3.

Dataset Nomenclature

The nomenclature in the codebase and in the paper are a bit different. We provide a mapping here of the form ([codebase dataset name] == [paper dataset name])

  • listopsc2 = ListOps
  • proplogic = Logical Inference
  • IMDB_lra = Text (LRA)
  • AAN_lra = Retrieval (LRA)
  • listops_lra = ListOps (LRA)
  • cifar10_lra_sparse = Image (LRA)
  • pathfinder_lra_sparse = Pathfinder (LRA)

Model Nomenclature

The nomenclature in the codebase and in the paper are a bit different. We provide a mapping here of the form ([codebase model name] == [paper model name])

  • Transformer = Transformer
  • UT = UT
  • GUT_end = GUT
  • GUT_token_end = GUT - Global Halt
  • GUT_nogate_end = GUT - Gate
  • GUT_notrans_end = GUT - Transition
  • TLB = TLB
  • GUTLB = GUTLB