/Normalized-Information-Payload

codes for paper: What Dense Graph do You Need for Self-attention?

Primary LanguagePython

Normalized Information Payload

Codes for paper: What Dense Graph Do You Need for Self-attention?. ICML 2022

Requirements

The whole environment is in whole_environment.txt. For some main packages, we use deepspeed==0.5.4 torch==1.9.1+cu111 transformers==4.10.0 datasets==1.14.0.

Usage:

There are two parts of experiments, the Long-Range-Arena(LRA) and BERT pretraining.

LRA

For LRA tasks, we construct our codes based on Nystromformer(paper, Github), run task by

python run_tasks.py --model attn_type --task taskname --seed seed

where we support attn_type in ["hypercube", "bigbird", "longformer", "global", "local", "random", "local+random"], taskname in ["listops","text","retrieval","image","pathfinder32-curv_contour_length_14"].

Datasets should be put in LRA/datasets/ and can be downloaded from here. You can also create datasets by yourself following Nystromformer LRA like we did.

Note:

1.The datasets used in all experiments are derived from previously published scientific papers. We upload it for easier reimplementation of our results.

2.In order to prevent infrigement, you can contact me to delete or obtain the data and remember to indicate the source when using it.

BERT Pretraining and finetuning

Codes are in Pretraining_finetuning. Pre-trained model CubeBERT can be downloaded Here. For BERT pretraining, we adopt the method from academic-budget-bert(paper, Github). Full guides can be found there.
For finetuning, run python run_glue_sparse.py \ --model_name_or_path modelpath\ --task_name mrpc \ --max_seq_length 128 \ --seed 42 \ --output_dir outputdir \ --overwrite_output_dir \ --do_train \ --fp16 \ --fp16_full_eval \ --do_eval \ --eval_steps 1000 \ --do_predict \ --save_strategy steps \ --save_steps 1000 \ --metric_for_best_model accuracy \ --evaluation_strategy steps \ --per_device_train_batch_size 32 \ --gradient_accumulation_steps 1 \ --per_device_eval_batch_size 32 \ --learning_rate 5e-5 \ --weight_decay 0.01 \ --max_grad_norm 1.0 \ --num_train_epochs 5 \ --lr_scheduler_type polynomial \ --warmup_steps 0

or

python run_mlm_sparse.py \ --seed 42 --model_name_or_path modelpath\ --dataset_name wikitext \ --dataset_config_name wikitext-103-raw-v1 \ --per_device_train_batch_size 8 \ --gradient_accumulation_steps 1 \ --per_device_eval_batch_size 8 \ --do_train \ --do_eval \ --evaluation_strategy steps \ --eval_steps 2 \ --fp16 \ --fp16_full_eval \ --logging_strategy steps \ --logging_steps 200 \ --output_dir output_dir \ --overwrite_output_dir .