Issues
- 2
ViT-B Training for DeiT
#233 opened by ziqipang - 0
Will you be releasing the accuracy of the official deit III framework trained tiny version on IN1k?
#241 opened by chenziwenhaoshuai - 2
DeiT depth 24 (CaiT - TABLE 1)
#218 opened by GoJunHyeong - 0
Gradient accumulation code
#240 opened by King4819 - 0
Question about different seeds per gpu with DDP
#239 opened by HIT-LiuChen - 0
- 0
Inclusion of Transformers Need Registers
#237 opened by mileseverett - 2
Slow Training
#234 opened by mueller-mp - 0
random.seed(seed) in line 205 is commented
#236 opened by Phuoc-Hoan-Le - 0
Checkpoints of IN21K pretrained deit III
#232 opened by Byakuya-zi - 0
- 0
TracerWarning
#230 opened by maingoc1605 - 2
batch_size flag
#220 opened by tsengalb99 - 2
- 0
How to launch a training of CAIT models ?
#226 opened by elias-ramzi - 0
Code for cosub
#224 opened by ppalantir - 2
The ablation experiment of DeiT
#215 opened by Berry-Wu - 5
ImageNet21K data preparation for pre-training
#219 opened by mxjecho - 1
Meaning of the model name ( ResMLP)
#207 opened by YHYeooooong - 1
Can I use timm==0.4.12 instead of timm==0.3.2 ?
#206 opened by irhallac - 2
unexpected keyword argument 'pretrained_cfg'
#212 opened by entron - 1
Are the hyperparameters for DeiT-T and for DeiT-S any different than DeiT-B?
#201 opened by Phuoc-Hoan-Le - 2
ImageNet21k pretrained model without finetuning on 1k
#181 opened by bhheo - 1
How long is it supposed to take to train on ImageNet21k for 90 epochs with 8 V100 GPUs
#198 opened by Phuoc-Hoan-Le - 1
number of classes
#197 opened by Ye-Na-Kim - 1
- 2
how to implement cosub training use deit-III
#217 opened by xiaoguang-1 - 0
- 0
how to implement cosub training use deit-III
#216 opened by xiaoguang-1 - 0
Single machine multi-GPU training
#213 opened by AlexNmSED - 0
Multi-node support
#208 opened by Phuoc-Hoan-Le - 0
Multinode Slurm Training
#204 opened by yazdanimehdi - 0
What batch size number other than 1024 have been tried when training a DeiT model?
#205 opened by Phuoc-Hoan-Le - 3
Does the EMA is used in DeiT-III?
#203 opened by mzr1996 - 1
Question about Throughput
#189 opened by techmonsterwang - 1
cifar100 pretrain model?
#191 opened by Wang-Y-S - 1
- 1
What is the difference between class attention in the paper CaiT and traditional multi-headed self-attention?
#196 opened by hutingz - 0
What is the ImageNet-1K Top-1 accuracy of Training from 0 to 400 epochs (Fig. 5 of Deit III paper)
#199 opened by sanyalsunny111 - 2
Config file of ViT-B/16
#195 opened by shashankvkt - 2
- 0
- 0
Is it possible if I can see how the validation accuracy changes over the number of epochs for DeiT?
#193 opened by Phuoc-Hoan-Le - 0
- 2
Reproduce PatchConvnet
#185 opened by billpsomas - 1
LAMB and amp
#182 opened by sgunasekar - 2
Question about training DeiT-small distilled
#186 opened by mingqiJ - 2
Is uniform drop-path rates beneficial ?
#184 opened by bhheo - 1
DeiT-tiny pth size is not 5M, it is 22M
#183 opened by witding - 4
Confusion about fine-tuning
#180 opened by WangWenhao0716