facebookresearch/fairscale
PyTorch extensions for high performance and large scale training.
PythonNOASSERTION
Issues
- 0
Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py
#1189 opened by Youngluc - 1
- 1
Example of MOE
#1165 opened by Juanhui28 - 4
what are pointwise Optimizers and non-pointwise Optimizers?
#1170 opened by bugm - 0
- 0
- 1
How can I use torchrun + model parallelism + FSDP
#1155 opened by HackGiter - 2
Issue in `ParallelEmbedding` constructor - scale_grad_by_freq being assigned to norm_type
#1156 opened by gtamer2 - 6
Memory reduction in the EMA mode
#1061 opened by voidrank - 8
Memory usage different from deepspeed
#1109 opened by x54-729 - 0
It is dangerous to using default non_block=True.
#1146 opened by heshenghuan - 2
torch.compile with FSDP
#1145 opened by santha96 - 4
assert self.has_full_params
#1134 opened by pokameng - 2
Hybrid Sharding in Fairscale's FSDP Implementation
#1133 opened by stephanpeitz - 0
Why ShardedDDP and OSS are slower than Vanilla DDP
#1131 opened by powermano - 0
pip install failed
#1130 opened by dogxxxxx - 0
- 0
[bug] pip package 0.4.13 fails to build wheel
#1128 opened by project-tuva - 0
Error Freezing Weights
#1126 opened by mostafaelhoushi - 4
- 5
Can exclude some layer parameter not to shard?
#1123 opened by robotcator - 5
Unexpected Large Memory Consumption during Tensor Parallelism Training with OPT-1.3B
#1111 opened by dangxingyu - 1
- 1
FSDP on model that has requires_grad = false
#1119 opened by andrasiani - 1
- 2
Whether modifying the source code (fully_sharded_data_parallel.py) will bring safety hazard?
#1115 opened by dropreg - 1
Combine powersgd with fairscale
#1113 opened by amsword - 0
- 1
Lots of Commandline Output from this line.
#1107 opened by jstraub - 10
[FSDP] Training gets slower as iterations increase when flatten_parameters=False?
#1102 opened by woodyx218 - 25
[FSDP] How to use customized backward hooks?
#1101 opened by woodyx218 - 3
- 3
Skip rather than fail tests in absence of `fair_dev`
#1096 opened by h-vetinari - 5
- 1
Any examples using AdaScale with fairseq?
#1094 opened by kedarkolluri - 3
- 1
[Question] FSDP vs ZeRO
#1090 opened by wookjeHan - 3
Why I cannot set move_params_to_cpu=True
#1089 opened by xdd12135 - 5
- 11
- 4
Question: VRAM Limits for FSDP across Asymmetrical GPUs?
#1073 opened by zaptrem - 10
Can't load optimizer state due to `state_steps`
#1083 opened by rowhanm - 1
is the prefetch_fsdp_params_simple branch deleted or merged?
#1082 opened by hyoo - 3
FSDP Forward order differs from that of first run
#1056 opened by Dahoas - 11
FSDP consumes the same amount of memory as DPP,why?
#1050 opened by Alex-Songs - 2
Inconsistence in checkpoint-wrapper when wrapped BN block
#1059 opened by yuctian - 8
Pytorch nightly build failing
#1057 opened by darren-pei - 1
How can I use only OSS in Pytorch lightning?
#1053 opened by CD21a - 2
- 4