juntang-zhuang/Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

Jupyter NotebookBSD-2-Clause

Issues

loss become nan when beta1=0
#67 opened 8 months ago by yojeep
0
AttributeError: 'AdaBeliefOptimizer' object has no attribute '_set_hyper'
#66 opened a year ago by SamMohel
4
The problem of reproducing the result of ImageNet
#65 opened 2 years ago by KaltsitI
4
Suppressing weight decoupling and rectification messages
#64 opened 2 years ago by gunsodo
1
Inconsistent use of epsilon
#61 opened 3 years ago by cossio
4
weight_decouple in adabelief tf
#60 opened 3 years ago by YannPourcenoux
1
Tensorflow restoration issue
#59 opened 3 years ago by soumen-ghosh
1
Some questions related to import adabelief
#58 opened 3 years ago by HelloWorldLTY
2
Similarity to AdaHessian
#16 opened 4 years ago by davda54
7
Inconsistent computation of weight_decay and grad_residual among pytorch versions
#56 opened 3 years ago by sjscotti
5
Your method is just equivalent to SGD with a changable global learning rate.
#57 opened 3 years ago by Yonghongwei
3
Compatibility with warmup
#55 opened 3 years ago by joihn
2
Question about SGD optimizer in LSTM experiments
#54 opened 3 years ago by yunfei-teng
1
Changing init learning rate
#53 opened 3 years ago by Kraut-Inferences
2
FileNotFoundError for ImageNet
#52 opened 3 years ago by kchak31
1
Documentation (at least for TF) and weight_decouple is not an option
#51 opened 4 years ago by grofte
2
Model load shows error message. ValueError: Unknown optimizer: AdaBeliefOptimizer
#41 opened 4 years ago by damianospark
1
On imagenet accuracy result 70.08
#50 opened 4 years ago by wyzjack
1
support for tensorflow 1.10+
#37 opened 4 years ago by chenxinhua
8
Why does g_t substract m_t, instead of m_{t-1} ?
#48 opened 4 years ago by zxteloiv
1
MSVAG
#47 opened 4 years ago by densechen
1
Upgrade with Adas optimizer
#45 opened 4 years ago by DaniyarM
3
Please add a license
#43 opened 4 years ago by 1e100
1
fine-tune with bert models
#42 opened 4 years ago by JaheimLee
2
Unstability in training in RNN
#10 opened 4 years ago by bratao
7
issues on AdaBlief-tensorflow
#27 opened 4 years ago by dusk666
7
Should this work with Mixed precision training (AMP)
#31 opened 4 years ago by Mut1nyJD
6
Imagenette baseline for AdaBelief
#40 opened 4 years ago by tmabraham
4
i use adabelief optimizer on fine-tune efficientb4 that acc is worse than Adam?
#38 opened 4 years ago by daixiangzi
26
Tensorflow Implementation
#34 opened 4 years ago by ManoharSai2000
14
Different usage of eps between "A quick look at the algorithm" and the code
#32 opened 4 years ago by tatsuhiko-inoue
10
recommended experiments
#21 opened 4 years ago by dvolgyes
1
Debug prints in ranger-adabelief
#4 opened 4 years ago by iiSeymour
4
Epsilon is important to Adaptive Optimizer
#24 opened 4 years ago by yuanwei2019
1
0.1.0 changes for ranger_adabelief
#19 opened 4 years ago by bratao
6
scripts for the toy examples?
#5 opened 4 years ago by XuezheMax
3
Is extra epsilon more important than belief?
#23 opened 4 years ago by yasutoshi
4
what is details about the experiments for cifar-100
#29 opened 4 years ago by XieBinghui
3
denom = (exp_avg_var.add_(group['eps']).sqrt() / math.sqrt(bias_correction2)).add_(group['eps'])
#18 opened 4 years ago by yuanwei2019
1
raw results
#26 opened 4 years ago by skyshoumeng
2
degenerated_to_sgd hyperparameter -- background and recommendations?
#25 opened 4 years ago by evanatyourservice
2
RangerAdaBelief setstate
#17 opened 4 years ago by soloice
2
Matlab implementation
#22 opened 4 years ago by pcwhy
8
UserWarning: This overload of add_ is deprecated
#9 opened 4 years ago by iiSeymour
1
Performance vs AdamW
#8 opened 4 years ago by iiSeymour
10
keyerror exp_avg_var
#7 opened 4 years ago by mcmingchang
5
Results on ImageNet with tuning weight decay
#11 opened 4 years ago by XuezheMax
11
torch version requirement
#13 opened 4 years ago by leonzgtee
0
Unfair comparison on ImageNet?
#6 opened 4 years ago by XuezheMax
2
Question: How similar or dissimilar is this compared to Hypergradient Descent?
#3 opened 4 years ago by muellerzr
2