Understanding the splits

Question

Understanding the splits

amangupt01 opened this issue 7 months ago · 1 comments

Thanks for sharing the code base and datasets! We enjoyed reading the paper.

I have some confusion regarding the splits used in the code. Can you please explain the difference between "pl_fixed", "fixed",
"pl_random" and "random"? Are they related to the low_label and high_label settings mentioned in the paper?

When we set the low_labels_test argument to 1 with random split it still directs to a 60-20 split of the dataset which is contrary to what is mentioned in the paper. Am I missing something here?

Thanks in Advance!

Answer 1 · 2024-03-09T21:41:17.000Z

Hi, fixed refers to the low label rate mentioned in the paper (20 labels per class, 'fixed' means usually this is given as the official split) while random refers to 60% setting. pl is the pseudo labels given by the TAPE. low_labels_test is related to the few-shot setting mentioned in the annotation part.