Issues
- 0
How to train a model with pippy
#1142 opened by sunkun1997 - 0
pippy.SaveModule not exist?
#1140 opened by laoda513 - 0
Code hangs permanently
#1138 opened by Narasimha1997 - 8
`pipeline` arguments are not matched
#1130 opened by rednoah91 - 0
Support for Autoregressive generation with LLMs
#1136 opened by apresunreve - 1
[Error] pipeline() got an unexpected keyword argument
#1134 opened by HieronZhang - 5
[BUG] cannot capture your model as a full graph
#1132 opened by sunkun1997 - 3
[Bug?] Gradient Synchronization for DDP
#1133 opened by jianweif - 1
- 6
Retrieving the Trained Model
#1094 opened by dheerj188 - 2
CPU offloading?
#1126 opened by Xynonners - 8
examples/huggingface failed
#1115 opened by yaxan - 2
ImportError: cannot import name 'pipeline' from 'pippy'
#1123 opened by bob020416 - 1
Can Pippy be combined with PEFT LoRA?
#1122 opened by Songjw133 - 3
Inference freezes when running llama example with pp>2
#1118 opened by JamesLYan - 0
- 0
[Test] Create a model registry for testing
#1062 opened by kwen2501 - 9
FSDP+PP tracer issue with cast-to-bf16
#1104 opened by wconstab - 6
FSDP+PP bug where reshard_after_forward must be true
#1105 opened by wconstab - 0
Exception when splitting model with "--autosplit"
#1087 opened by spupyrev - 2
PP Tracer doesn't work with fused_rmsnorm
#1108 opened by wconstab - 0
Torchtitan Pipeline Parallel Issue Tracker
#1103 opened by wconstab - 2
- 0
Infinite recursion on torch.export for PP tracing
#1107 opened by wconstab - 1
FSDP+PP requires changing layer iteration code
#1106 opened by wconstab - 1
(FSDP or DDP) + PP support
#1026 opened by wconstab - 2
- 0
Use relative import to simplify migration
#1061 opened by kwen2501 - 3
mb_index undefined in Interleaved 1F1B
#1050 opened by kwen2501 - 0
GPipe Schedule hangs when run with 1 microbatch
#1064 opened by wconstab - 0
Interleaved 1f1b performance
#1033 opened by H-Huang - 4
Unexpected Memory Usage and Latency with PP
#1056 opened by Lucius-THU - 0
Interleave 1F1B does not wait comm ops
#1053 opened by kwen2501 - 0
Rename schedules
#1010 opened by gnadathur - 2
accommodate comments in #998
#1004 opened by gnadathur - 0
unify impl of forward_one_chunk and backward_one_chunk
#1005 opened by gnadathur - 0
- 1
autograd.backward() or autograd.grad()
#1031 opened by kwen2501 - 1
create constructor that accepts stage module
#1012 opened by gnadathur - 0
DCP + PP docs and ux
#1027 opened by wconstab - 2
Is there a way to export a pipeline stage?
#1016 opened by nrs-status - 5
- 1
unify impl of forward_one_chunk and backward_one_chunk
#1013 opened by gnadathur - 0
via #987: Migrate PipelineStage
#1011 opened by gnadathur - 0
#1000 add back skip connection support
#1008 opened by gnadathur - 0
#987 unify interface bw stages and schedules
#1007 opened by gnadathur - 0
#957 add 1f1b and fix interleaved 1f1b
#1006 opened by gnadathur - 0
- 0
Validate different state_dict composition with PP for DCP
#1002 opened by gnadathur - 1
Updated example_train.py hangs on CPU training
#989 opened by Heasummn