HobbitLong/RepDistiller
[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
PythonBSD-2-Clause
Issues
- 2
- 2
How do you choose the optimal hyper-parameters?
#20 opened by JinYang88 - 1
about using the resnet models for cifar10
#48 opened by EmnaGuermazi97 - 2
Ensemble Task Implementation
#50 opened by sdsawtelle - 1
the result is different in resnet56
#56 opened by SuWideSun - 2
Failed to download the teacher models
#47 opened by Prisoneryc - 0
No dev set split
#59 opened by guzy0324 - 1
ERROR :run ./fetch_pretrained_teachers.sh
#57 opened by DPingWu - 1
How to use myself datasets?
#58 opened by 1997Jessie - 0
Is Ensemble distillation also included?
#54 opened by YanjingLiLi - 0
Hyperparameter Settings for KD on Imagenet
#53 opened by Calmepro777 - 1
Why using log_softmax instead of softmax?
#52 opened by nguyenvulong - 0
- 0
- 3
resnet structure seems to be a bit wrong
#46 opened by surprisedong - 0
- 1
Cross modal KD implementation release?
#39 opened by liu115 - 3
- 3
AttributeError: 'CIFAR100InstanceSample' object has no attribute 'train_data'
#38 opened by Jiawen-huang - 0
- 0
Error while running the code
#43 opened by frestuc - 2
- 9
- 1
what is the difference between the position of putting "with torch.no_grad()"
#37 opened by ChriswooTalent - 0
how to train my model?
#35 opened by 972461099 - 0
About deep mutual learning setting
#34 opened by swlzq - 0
About the CE loss
#33 opened by XiXiRuPan - 0
ImageNet results
#32 opened by senya-ashukha - 0
How to train teacher model
#31 opened by tiancity-NJU - 0
- 0
Question about pretrained teacher model
#29 opened by MaorunZhang - 1
hyperparameters for other methods
#25 opened by wukailu - 1
the introduction of ContrastMemory
#27 opened by sanshanxiashi - 0
Multiple GPU training
#28 opened by deropty - 3
Reported results based on early stopping?
#16 opened by VladimirLi - 1
KD method in both configurations seems to be doing better than all other methods except the one from your paper
#26 opened by ksachdeva - 3
questions about ContrastMemory
#24 opened by jianxiangm - 0
- 0
How can I use CRD_loss to face landmark detetct for model compression? There is no "opt.nce_k: number of negatives paired with each positive".
#22 opened by gjd2017 - 0
The calculation of correlation matrix
#21 opened by winycg - 1
code for ensemble distillation
#17 opened by tonmoy-saikia - 2
- 1
- 2
- 2
Form of the h function for infinite dataset
#13 opened by brotherofken - 2
Memory issue about the NST LOSS
#12 opened by leoozy - 5
Questions about ContrastMemory
#11 opened by HelloTobe - 1
Results on ImageNet
#10 opened by xuguodong03 - 1
In the 2 result tables, WRN-40-2, as the teacher, after distilling the students, the students get higher performance(CRD+KD), why?
#9 opened by splinter21 - 1
Regression task
#8 opened by xjcvip007