juntang-zhuang/Adabelief-Optimizer
Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
Jupyter NotebookBSD-2-Clause
Issues
- 0
loss become nan when beta1=0
#67 opened by yojeep - 4
- 4
- 1
- 4
Inconsistent use of epsilon
#61 opened by cossio - 1
weight_decouple in adabelief tf
#60 opened by YannPourcenoux - 1
Tensorflow restoration issue
#59 opened by soumen-ghosh - 2
Some questions related to import adabelief
#58 opened by HelloWorldLTY - 7
Similarity to AdaHessian
#16 opened by davda54 - 5
Inconsistent computation of weight_decay and grad_residual among pytorch versions
#56 opened by sjscotti - 3
Your method is just equivalent to SGD with a changable global learning rate.
#57 opened by Yonghongwei - 2
Compatibility with warmup
#55 opened by joihn - 1
- 2
Changing init learning rate
#53 opened by Kraut-Inferences - 1
FileNotFoundError for ImageNet
#52 opened by kchak31 - 2
- 1
Model load shows error message. ValueError: Unknown optimizer: AdaBeliefOptimizer
#41 opened by damianospark - 1
On imagenet accuracy result 70.08
#50 opened by wyzjack - 8
support for tensorflow 1.10+
#37 opened by chenxinhua - 1
Why does g_t substract m_t, instead of m_{t-1} ?
#48 opened by zxteloiv - 1
- 3
Upgrade with Adas optimizer
#45 opened by DaniyarM - 1
Please add a license
#43 opened by 1e100 - 2
fine-tune with bert models
#42 opened by JaheimLee - 7
Unstability in training in RNN
#10 opened by bratao - 7
issues on AdaBlief-tensorflow
#27 opened by dusk666 - 6
- 4
Imagenette baseline for AdaBelief
#40 opened by tmabraham - 26
i use adabelief optimizer on fine-tune efficientb4 that acc is worse than Adam?
#38 opened by daixiangzi - 14
Tensorflow Implementation
#34 opened by ManoharSai2000 - 10
Different usage of eps between "A quick look at the algorithm" and the code
#32 opened by tatsuhiko-inoue - 1
recommended experiments
#21 opened by dvolgyes - 4
Debug prints in ranger-adabelief
#4 opened by iiSeymour - 1
Epsilon is important to Adaptive Optimizer
#24 opened by yuanwei2019 - 6
0.1.0 changes for ranger_adabelief
#19 opened by bratao - 3
scripts for the toy examples?
#5 opened by XuezheMax - 4
Is extra epsilon more important than belief?
#23 opened by yasutoshi - 3
- 1
denom = (exp_avg_var.add_(group['eps']).sqrt() / math.sqrt(bias_correction2)).add_(group['eps'])
#18 opened by yuanwei2019 - 2
raw results
#26 opened by skyshoumeng - 2
- 2
RangerAdaBelief setstate
#17 opened by soloice - 8
Matlab implementation
#22 opened by pcwhy - 1
- 10
Performance vs AdamW
#8 opened by iiSeymour - 5
keyerror exp_avg_var
#7 opened by mcmingchang - 11
Results on ImageNet with tuning weight decay
#11 opened by XuezheMax - 0
torch version requirement
#13 opened by leonzgtee - 2
Unfair comparison on ImageNet?
#6 opened by XuezheMax - 2
Question: How similar or dissimilar is this compared to Hypergradient Descent?
#3 opened by muellerzr