Issues
- 4
Adding another logbook (kinda)
#52 opened by boweiliu - 0
- 3
AMD MI250X MAMF efficiency is wrong
#76 opened by rlrs - 5
How can operations with the INT8 data type be performed using a GPU accelerator card? Is dequantization required?
#75 opened by cpollo55 - 2
- 3
PDF link in readme doesn't work
#73 opened by sytelus - 4
GPU utilization monitoring
#72 opened by fortminors - 2
Performance Profiling
#71 opened by jeromeku - 2
[Question] `FSDP` vs `Deepspeed ZeRO3 / ZeRO++`
#66 opened by jeromeku - 0
- 0
- 1
slurm job array change nodes
#61 opened by ethanhe42 - 10
MAMF - GH200
#58 opened by frankschae - 3
- 2
Question about changing precision post training
#41 opened by Thytu - 1
- 0
- 7
- 2
- 3
Quarto Site
#28 opened by saforem2 - 3
Improve folder structure
#15 opened by heyimjonas - 3
- 10
convert markdown to pdf
#6 opened by pengzhangzhi - 3
- 4
GPU requirements and cost estimation.
#9 opened by Anindyadeep - 1
Daisy chain batch jobs
#13 opened by adammoody - 4
Minor Typo in emulate multi node
#8 opened by anindya-saha - 2
Missing `hparams` section
#5 opened by jvmncs - 2
Convert to bfloat16 failing
#2 opened by mhillebrand - 10
Parallel training hangs
#1 opened by mhillebrand