Codes for the paper: https://arxiv.org/abs/2402.18838
The dataset used in for the paper are available at https://gluebenchmark.com/ and https://super.gluebenchmark.com/.
Hyperparameters can be found and should be defined in:
./corpus-gen/para.py
The code below will generate scrambled texts for each dataset in GLUE and SuperGLUE.
python ./corpus-gen/main.py
Hyperparameters can be found and should be defined in:
./probe/para.py
The re-ordering model takes in scrambled texts and output the orginal texts. The model structure is T5.
python ./probe/main.py
The mutual information between scrambled texts and orginal texts is estimated with a re-ordering model and a pre-trained language model. The re-ordering model is trained with the code in probe, while the language model is available at huggingface.
python ./mi/main.py
For each scrambled dataset, the code below will generate the corresponding accuracy for each task using a T5 model.
python ./acc/main.py
The GLM model is trained using the lme4 package. The visualization is achieved with the ggplot2 package.
R ./R/main.R