ExperimentTransformer

This repository contains the implementation and tests of Linear attention (self-attention depends on N instead of N^2 time but can't work in zero-shot) and 8bit optimization (32 bit to 8 bit ~ memory boost by 4 times with capability of zero-shot inference and using pretrained weights). All method give same results.

Available config changes: python -m train --config configs/<>

  • attention "full" (vanilla) / "linear"
  • precision 32/8 with appropriate Adam8bit
  • task of GLUE "cola" which is classofocation whether it is a grammatical English sentence with accuracy metric or "sst2" (sentiment classification) with MCC metric Some experiments are shown in lab.ipynb

Speed-test on default weights transformer because of uselessly using it for score benchmark

Method Time (s)
vanilla 0.29
linear 0.26
8bit 0.19
linear + 8bit 0.15

Also you can finetune or zero-shot inference pretrain deberta in 8 bit by adding argument --deberta-path <path to folder> in transformers format