dropreg/R-Drop

Python

Issues

Clarification on Using Concatenated Input for R-Drop Training
#33 opened a year ago by xyb314
1
Training configuration for the WMT14 EnDe dataset?
#19 opened 3 years ago by frankang
5
How to use the data parallel in r-drop.
#32 opened 2 years ago by xinxinxing
1
Question of the proof
#31 opened 2 years ago by SYSUykLin
1
Some question about reproducing GLUE
#30 opened 2 years ago by wpwpwpyo
0
Can not reproduce following the hyperparameter in the paper for finefuning ViT on Cifar100
#25 opened 2 years ago by NamlessM
4
How the `warmup steps` affects the performance?
#28 opened 3 years ago by Doragd
2
kl loss in ViT example supposed to be divided by 2?
#29 opened 3 years ago by sieu-n
1
Can I use R-Drop in Semantic Search?
#27 opened 3 years ago by ralgond
1
Unable to preprocess data for summarization
#14 opened 3 years ago by samiksome92
2
error: argument --task: invalid choice: 'rdrop_translation'
#22 opened 3 years ago by tairan-w
1
pip install --editable .报错
#15 opened 3 years ago by Shiwen-Ni
1
can not reproduce the results following the hyparameters in the paper
#23 opened 3 years ago by leoozy
1
About the implementation in transformers, where the reduction in ce_loss uses the mean (by default), while KL uses the reduction is sum ?
#24 opened 3 years ago by XiaoqingNLP
1
Inconsistency for KL loss and CE loss hyper-parameters and baselines results in GLUE
#6 opened 3 years ago by zhangzhenyu13
5
JS divergence in the research paper?
#17 opened 3 years ago by sieu-n
1
R-drop makes my model broken.
#12 opened 3 years ago by MayDomine
9
Where is R-Drop code in R-Drop/huggingface_transformer_src/bert_rdrop/run_glue.py?
#18 opened 3 years ago by zhenshiqi1996
6
difference between R-Drop and SimCse + Smart
#21 opened 3 years ago by cuixuage
1
unable to reproduce results on GLUE
#16 opened 3 years ago by 1024er
2
Can mseloss replace KL divergence？
#20 opened 3 years ago by 18335100284
1
What's Wrong with my TensorFlow (1.14 or 1.15) implementation?
#13 opened 3 years ago by guotong1988
2
Will KLD loss degrease very fast?
#5 opened 3 years ago by snsun
9
Readme File for RoBerta example.
#11 opened 3 years ago by ShreyPandit
1
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
#9 opened 3 years ago by paul-chelarescu
2
Summarization task fails with 'Trying to backward through the graph a second time'
#8 opened 3 years ago by paul-chelarescu
2
A simple way to double the impact of R-Drop
#10 opened 3 years ago by guotong1988
3
what the dropout should be set when we predict or test?
#7 opened 3 years ago by hitwangshuai
2
Fairseq tasks install work?
#4 opened 3 years ago by kungfu-eric
1
Why you use (p, q_tec) and (q, p_tec) rather than (p, q) and (q, p) to compute kl-loss?
#3 opened 3 years ago by JaheimLee
0
What are the core code lines of R-Drop? Thank you very much.
#2 opened 3 years ago by guotong1988
1
What are the core code lines of R-Drop? Thank you very much.
#1 opened 3 years ago by guotong1988
0