HanGuo97/lq-lora

full GLUE scores

BaohaoLiao opened this issue · 4 comments

Could you offer me the full GLUE scores, i.e. scores for different tasks, for comparison?

I can only find the average GLUE score in Table 2. However, I only want to do the experiments on some tasks for the first step and compare my results to your results. It would be great if you could offer the full scores of all methods in Table 2. Thank you in advance.

Sure, is this what you are looking for?

cola mnli mrpc qnli qqp rte sst2 stsb
Full FT 70.2 89.7 92.0 94.1 89.8 84.1 95.8 92.2
QLoRA (3.127) 64.3 89.8 92.0 94.3 89.9 70.8 96.4 91.6
QLoRA + ILP (2.5) 36.4 81.6 81.7 90.7 85.9 58.8 88.0 80.4
QLoRA + ILP (2.75) 54.9 87.2 85.9 91.5 86.8 58.1 94.4 86.5
QLoRA + ILP (3.0) 61.3 89.4 90.6 94.0 87.8 75.5 95.4 90.0
QLoRA + ILP (3.25) 61.7 89.6 89.5 94.1 88.3 78.7 96.0 90.8
LQ-LoRA (2.5) 63.6 89.5 90.0 94.0 89.1 73.3 94.8 91.0
LQ-LoRA (2.75) 64.8 89.9 91.5 94.4 89.3 81.2 95.3 90.6
LQ-LoRA (3.0) 64.8 90.4 92.4 94.3 89.4 79.8 95.9 91.8
LQ-LoRA (3.25) 66.5 90.4 92.2 94.1 89.5 83.8 96.4 91.9
LQ-LoRA (Fisher) (2.5) 65.4 89.9 90.0 94.3 89.7 81.6 95.6 91.9
LQ-LoRA (Fisher) (2.75) 65.3 90.1 90.5 94.3 89.6 73.3 96.2 91.8
LQ-LoRA (Fisher) (3.0) 63.5 90.3 92.2 94.6 89.7 80.5 96.2 91.8
LQ-LoRA (Fisher) (3.25) 67.4 90.4 91.7 94.6 89.6 83.8 96.7 92.1

Closing this assuming the question is addressed. Please feel free to re-open if you have follow-up questions.

Which metric do you report for MRPC, STS-B and QQP? I see your result of Full FT is different from the RoBERTa paper.

Normally we can use accuracy and F1 for MRPC, pearson and spearman correlation for STS-B, and accuracy and F1 for QQP. Do you only report one metric, or do you calculate their mean?

We use the combined_score from HuggingFace for the three tasks you mentioned.