Issues
- 10
[Question] Limit learning rate
#37 opened by Specthram - 0
lora training with fsdp will hang up
#35 opened by gameofdimension - 5
- 2
Prodigy 8-bit version
#21 opened by KonokoAz - 5
Question on convergence
#18 opened by ppbrown - 2
Lowering TE or Unet average only
#16 opened by trihardseven - 4
- 9
- 1
beta3: difference between paper and code
#24 opened by dxqbYD - 2
Growth_rate
#15 opened by DarkAlchy - 8
Possible to marry Prodigy and AdamW?
#11 opened by askerlee - 2
Document incompatibility with gradient clipping
#13 opened by crypdick - 1
Is there a way to monitor the estimated LR over time, if it has any meaning?
#12 opened by ethansmith2000 - 0
- 3
T_MAX value (CosineAnnealingLR)
#10 opened by josemerinom - 38
- 1
Question regarding t_max and d estimation
#6 opened by DanPli - 3
Is the any rule of the thumb for tuning weight_decay of Prodigy when training transformers-based LLMs?
#5 opened by DesperateExplorer - 3
- 4
- 1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#1 opened by manyotherfunctions