Issues
- 0
TP does not work on Flan-T5
#215 opened by shreyansh26 - 0
is there any plan to support llama model?
#214 opened by yuyaxiong - 1
To apply FlashAttention
#203 opened by dyanos - 9
TODO: CUDA Kernel for Transformer Model
#131 opened by dyanos - 0
TODO : Zero optimizer deparallel
#186 opened by hyeinhyun - 0
TODO : Zero stage 1,2 backward
#130 opened by hyeinhyun - 0
TODO : Documentation for Expert Parallel
#175 opened by scsc0511 - 0
TODO: Test Code Enhancement
#165 opened by bzantium - 0
TODO: Write dp documentation and tutorials
#154 opened by jinwonkim93 - 0
Fix TP embedding layers
#152 opened by jason9693 - 0
TODO: Save Deparallelized Expert Parallel Model
#101 opened by scsc0511 - 0
Refactor all test scripts
#126 opened by hyunwoongko - 0
- 0
Lazy Parallelization
#115 opened by hyunwoongko - 0
How to pretrain T5
#122 opened by TristanThrush - 0
Replace gitlab repository in pre-commit with github
#120 opened by bzantium - 0
Fix T5 Tensor parallelism bugs
#117 opened by hyunwoongko - 0
Implement vocab parall crossentropy loss
#79 opened by loopinf - 0
remove dp from main branch for now
#116 opened by hyunwoongko - 1
Add description how to use fused_scale_softmax
#33 opened by loopinf - 1
- 1
- 1
- 0
TODO: Deparallelize Pipeline Parallel
#91 opened by ohwi - 0
Add oslo models to transformer __init__ file
#107 opened by loopinf - 1
TODO: documentation and readme
#102 opened by hyunwoongko - 0
TODO : Deparallelize expert parallel
#82 opened by scsc0511 - 0
PatricStar for Zero
#97 opened by dongsungkim - 1
coloDDP integration
#93 opened by dongsungkim - 1
No _TensorParallelMappingForHuggingFace
#88 opened by dongsungkim - 0
add mapping for oslo model
#86 opened by loopinf - 0
- 0
Implement vocab parallel cross entropy loss
#80 opened by bzantium - 0
Refactoring transformers wrap
#77 opened by minqukanq - 0
wrap transformers layer
#74 opened by minqukanq - 2
pass optimizer parameters
#70 opened by minqukanq - 0
- 0
fused_scale_mask_softmax on GPT2 model
#62 opened by loopinf - 1
- 0
Add datadistribedSampler to DDP
#59 opened by dongsungkim - 2
- 0
Make sequence parallel splitting automatic
#55 opened by hyunwoongko - 0
SP parameter device type error
#49 opened by dongsungkim - 0
Modify PP to functional
#50 opened by hyunwoongko - 0
TODO: Test TP + PP
#48 opened by hyunwoongko - 0
Error on test_modeling_bert.py
#32 opened by loopinf - 1
fused_bias_gelu is missing when call BertModel
#43 opened by loopinf - 0
Fix some code errors
#34 opened by hyunwoongko - 0
Fix sorting error at allocate_param function
#39 opened by bzantium - 0
TODO: clean data communication functions for PP
#42 opened by ohwi