Issues
- 1
- 8
Checkpoint saves failing for eager mode training
#168 opened by chauhang - 0
add compiled RMSNorm into the norm config
#374 opened by tianyu-l - 0
- 0
Add torchdata to requirements after release
#351 opened by gokulavasan - 6
reload existing llama checkpoints
#305 opened by tianyu-l - 9
Make dataloader stateful?
#291 opened by XinDongol - 7
RoPE implementation differences
#335 opened by rlrs - 1
- 3
Code change that changes the model semantics
#347 opened by kwen2501 - 5
Loss curve spikes on amalagamated datasets - need full scale shuffler in dataloader
#128 opened by lessw2020 - 1
- 5
Converting to checkpoint.pd is not working
#307 opened by viai957 - 7
Question on Model Init
#312 opened by XinDongol - 6
profiler issue when training with 64 or more GPUs
#266 opened by tianyu-l - 1
Make fused RMSNorm a registered op
#199 opened by lessw2020 - 0
numerical difference for SDPA between non-dtensor vs dtensor, when math attention and fp16 are used
#317 opened by tianyu-l - 0
- 0
add doc for adding custom dataset
#311 opened by lessw2020 - 1
Verify that we can do eval / inference
#192 opened by gnadathur - 1
[Feature] Add fineweb dataset
#309 opened by viai957 - 2
Custom dataset for llama 3 finetuning
#310 opened by rshah918 - 0
freezeing some part of the model
#306 opened by tianyu-l - 0
metrics - add L1 gradient norm tracking
#119 opened by lessw2020 - 3
Grad scaler not in train state
#146 opened by BadrYoubiIdrissi - 4
Starting off with different models across ranks and FSDP doesn't synchronise
#166 opened by BadrYoubiIdrissi - 0
Add HSDP + TP/SP support
#176 opened by gnadathur - 0
FSDP2 based HSDP support
#177 opened by gnadathur - 1
Add support for MoE model architecture
#184 opened by gnadathur - 0
- 2
- 0
numerical issue when running SDPA with DTensor
#267 opened by tianyu-l - 1
[Feature] Plan to add `model_register`
#282 opened by XinDongol - 5
[Feature] Add gradient accumulation
#292 opened by XinDongol - 4
- 1
Wrong mesh order
#286 opened by ad8e - 17
Question: tp able to run a model which not able to fit a single batch on GPU?
#276 opened by lucasjinreal - 4
Question; parallelising convolutional layers?
#277 opened by jvwilliams23 - 3
update metric title as 'tokens per second' (TPS) rather than Words per Second (WPS)
#263 opened by lessw2020 - 2
simplify meta_init (rope embeddings)
#110 opened by lessw2020 - 4
E2E training numbers for 13B/70B
#118 opened by wanchaol - 0
TorchTrain: Release blocking Issues master tracker
#186 opened by gnadathur - 2
- 1
- 5
- 3
FSDP2 incur higher CPU memory usage in 2D compare to FSDP1
#208 opened by awgu - 3
Validate DCP load and save for 1D and 2D w/ FSDP2
#108 opened by gnadathur - 2
Implement fast Layer norm to get decent MFU
#196 opened by gnadathur - 1
Validate FSDP2 + SP parity with FSDP1 + SP
#107 opened by gnadathur - 0
Integration test for torchtrain
#109 opened by gnadathur