Issues
- 1
- 5
- 1
How to use the data parallel in r-drop.
#32 opened by xinxinxing - 1
Question of the proof
#31 opened by SYSUykLin - 0
Some question about reproducing GLUE
#30 opened by wpwpwpyo - 4
Can not reproduce following the hyperparameter in the paper for finefuning ViT on Cifar100
#25 opened by NamlessM - 2
How the `warmup steps` affects the performance?
#28 opened by Doragd - 1
- 1
Can I use R-Drop in Semantic Search?
#27 opened by ralgond - 2
Unable to preprocess data for summarization
#14 opened by samiksome92 - 1
- 1
pip install --editable .报错
#15 opened by Shiwen-Ni - 1
- 1
About the implementation in transformers, where the reduction in ce_loss uses the mean (by default), while KL uses the reduction is sum ?
#24 opened by XiaoqingNLP - 5
Inconsistency for KL loss and CE loss hyper-parameters and baselines results in GLUE
#6 opened by zhangzhenyu13 - 1
JS divergence in the research paper?
#17 opened by sieu-n - 9
R-drop makes my model broken.
#12 opened by MayDomine - 6
Where is R-Drop code in R-Drop/huggingface_transformer_src/bert_rdrop/run_glue.py?
#18 opened by zhenshiqi1996 - 1
difference between R-Drop and SimCse + Smart
#21 opened by cuixuage - 2
unable to reproduce results on GLUE
#16 opened by 1024er - 1
Can mseloss replace KL divergence?
#20 opened by 18335100284 - 2
- 9
Will KLD loss degrease very fast?
#5 opened by snsun - 1
Readme File for RoBerta example.
#11 opened by ShreyPandit - 2
- 2
Summarization task fails with 'Trying to backward through the graph a second time'
#8 opened by paul-chelarescu - 3
A simple way to double the impact of R-Drop
#10 opened by guotong1988 - 2
- 1
Fairseq tasks install work?
#4 opened by kungfu-eric - 0
Why you use (p, q_tec) and (q, p_tec) rather than (p, q) and (q, p) to compute kl-loss?
#3 opened by JaheimLee - 1
- 0