Issues
- 0
why did you delete SophiaH?
#53 opened by Andron00e - 0
i think it is similar than rmsprop
#52 opened by YooSungHyun - 1
- 2
Reason for the discrete distribution
#49 opened by dgm2 - 1
Does Sophia works with activation-checkpointing?
#50 opened by ahmdtaha - 3
What are the values of "bs" for vision?
#48 opened by codonna9 - 7
- 1
NameError: name 'ddp_rank' is not defined
#37 opened by ThuanNaN - 2
Implementing Sophia-H alternative
#42 opened by thegodone - 0
- 3
Training on lit-llama failed to get convergence
#24 opened by hx-Tang - 1
- 1
- 0
Sophia-H Implementation in third party
#38 opened by robotzheng - 4
Sophia on jax
#25 opened by sglucas - 1
Can't get good results on smaller models
#30 opened by tsalex1992 - 1
Unable to reproduce the GPT 2 small results
#43 opened by pmpalang - 11
- 1
Few-shot evaluation code available?
#45 opened by sanyalsunny111 - 2
Which is the original repo?
#36 opened by SagiPolaczek - 2
Bug in the per-coordinate clipping?
#34 opened by vmarkovtsev - 1
Sophia with multitensor apply / FusedSophia
#26 opened by skyshine102 - 2
which is the original code we should use?
#28 opened by brando90 - 0
Please package Sophia as a PyPi Package
#29 opened by guilt - 1
Availability of models?
#21 opened by ArthurConmy - 1
Hessian-vector product vs. Hessian estimator
#23 opened by zhouyuan - 2
RuntimeError: Passing `optimizers` is not allowed if Fairscale, Deepspeed or PyTorch FSDP is enabled
#17 opened by lw3259111 - 3
- 1
Does Sophia support multiple GPU nodes?
#20 opened by skye-glitch - 3
Does this work with 16-mixed precision
#16 opened by tkella47 - 3
Does not reduced CrammingBERT training time
#18 opened by tbaggu - 0
Is Sophia-G a second-order optimzier?
#19 opened by Godforever - 3
- 3
Training LLMs such as BERT
#12 opened by KongMingxi - 6
Having trouble replicating the result
#10 opened by nalzok - 0
Minor shape error
#11 opened by anruigu - 2
- 1
Ambiguous dependency specification
#5 opened by nalzok - 2
Is it applicable for any loss function?
#3 opened by subercui - 1
Incomplete WandB logging
#6 opened by nalzok - 2
Sophia-H Implementation?
#7 opened by nalzok - 1
Evaluation on other domains
#2 opened by francqz31 - 1
Does it support BF16?
#4 opened by acostin1