EleutherAI/oslo

OSLO: Open Source for Large-scale Optimization

PythonNOASSERTION

Issues

TP does not work on Flan-T5
#215 opened a year ago by shreyansh26
0
is there any plan to support llama model?
#214 opened a year ago by yuyaxiong
0
To apply FlashAttention
#203 opened a year ago by dyanos
1
TODO: CUDA Kernel for Transformer Model
#131 opened a year ago by dyanos
9
TODO : Zero optimizer deparallel
#186 opened a year ago by hyeinhyun
0
TODO : Zero stage 1,2 backward
#130 opened a year ago by hyeinhyun
0
TODO : Documentation for Expert Parallel
#175 opened 2 years ago by scsc0511
0
TODO: Test Code Enhancement
#165 opened 2 years ago by bzantium
0
TODO: Write dp documentation and tutorials
#154 opened 2 years ago by jinwonkim93
0
Fix TP embedding layers
#152 opened 2 years ago by jason9693
0
TODO: Save Deparallelized Expert Parallel Model
#101 opened 2 years ago by scsc0511
0
Refactor all test scripts
#126 opened 2 years ago by hyunwoongko
0
Remove class replace which is unnecessary from TP wrapper.
#124 opened 2 years ago by bzantium
0
Lazy Parallelization
#115 opened 2 years ago by hyunwoongko
0
How to pretrain T5
#122 opened 2 years ago by TristanThrush
0
Replace gitlab repository in pre-commit with github
#120 opened 2 years ago by bzantium
0
Fix T5 Tensor parallelism bugs
#117 opened 2 years ago by hyunwoongko
0
Implement vocab parall crossentropy loss
#79 opened 2 years ago by loopinf
0
remove dp from main branch for now
#116 opened 2 years ago by hyunwoongko
0
Add description how to use fused_scale_softmax
#33 opened 2 years ago by loopinf
1
fix _FullyShardedDataParallelMapping when running test_fsdp.py
#99 opened 2 years ago by josemlopez
1
FSDP returns different loss value with zero stage 2 and 3
#66 opened 2 years ago by dongsungkim
1
Integration ZeroDDP and ShardedModelv2 from colossal AI
#96 opened 2 years ago by dongsungkim
1
TODO: Deparallelize Pipeline Parallel
#91 opened 2 years ago by ohwi
0
Add oslo models to transformer __init__ file
#107 opened 2 years ago by loopinf
0
TODO: documentation and readme
#102 opened 2 years ago by hyunwoongko
1
TODO : Deparallelize expert parallel
#82 opened 2 years ago by scsc0511
0
PatricStar for Zero
#97 opened 2 years ago by dongsungkim
0
coloDDP integration
#93 opened 2 years ago by dongsungkim
1
No _TensorParallelMappingForHuggingFace
#88 opened 2 years ago by dongsungkim
1
add mapping for oslo model
#86 opened 2 years ago by loopinf
0
Apply vocab parallel cross entropy for oslo models
#84 opened 2 years ago by bzantium
0
Implement vocab parallel cross entropy loss
#80 opened 2 years ago by bzantium
0
Refactoring transformers wrap
#77 opened 2 years ago by minqukanq
0
wrap transformers layer
#74 opened 2 years ago by minqukanq
0
pass optimizer parameters
#70 opened 2 years ago by minqukanq
2
Rename SequenceDataParallel to SequenceParallel
#67 opened 2 years ago by hyunwoongko
0
fused_scale_mask_softmax on GPT2 model
#62 opened 2 years ago by loopinf
0
TODO : switch code base of expert parallel from colossalai to deepspeed
#44 opened 2 years ago by scsc0511
1
Add datadistribedSampler to DDP
#59 opened 2 years ago by dongsungkim
0
Change bert model to use `_fused_scale_mask_softmax` functions
#53 opened 2 years ago by loopinf
2
Make sequence parallel splitting automatic
#55 opened 2 years ago by hyunwoongko
0
SP parameter device type error
#49 opened 2 years ago by dongsungkim
0
Modify PP to functional
#50 opened 2 years ago by hyunwoongko
0
TODO: Test TP + PP
#48 opened 2 years ago by hyunwoongko
0
Error on test_modeling_bert.py
#32 opened 2 years ago by loopinf
0
fused_bias_gelu is missing when call BertModel
#43 opened 2 years ago by loopinf
1
Fix some code errors
#34 opened 2 years ago by hyunwoongko
0
Fix sorting error at allocate_param function
#39 opened 2 years ago by bzantium
0
TODO: clean data communication functions for PP
#42 opened 2 years ago by ohwi
0