Pinned Repositories
GLUE-X
We collect 13 publicly available datasets as OOD test data and conduct evaluations on 8 classic NLP tasks over popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.
GLUE-X
We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.
shuiba0's Repositories
shuiba0/GLUE-X
We collect 13 publicly available datasets as OOD test data and conduct evaluations on 8 classic NLP tasks over popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.