facebookresearch/fairscale

PyTorch extensions for high performance and large scale training.

PythonNOASSERTION

Issues

Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py
#1189 opened 4 months ago by Youngluc
0
Raising `assert param.grad is not None` when finetuning LoRA.
#1188 opened 6 months ago by HashimotoPatrickMu
1
Example of MOE
#1165 opened 10 months ago by Juanhui28
1
what are pointwise Optimizers and non-pointwise Optimizers?
#1170 opened 9 months ago by bugm
4
[question] Different training between DDP & Sharded DDP
#1172 opened 9 months ago by kwohlfahrt
0
FSDP on the same CNN model requires more memory than DataParallel
#1163 opened 9 months ago by s-reaungamornrat
0
How can I use torchrun + model parallelism + FSDP
#1155 opened a year ago by HackGiter
1
Issue in `ParallelEmbedding` constructor - scale_grad_by_freq being assigned to norm_type
#1156 opened a year ago by gtamer2
2
Memory reduction in the EMA mode
#1061 opened 2 years ago by voidrank
6
Memory usage different from deepspeed
#1109 opened 2 years ago by x54-729
8
It is dangerous to using default non_block=True.
#1146 opened a year ago by heshenghuan
0
torch.compile with FSDP
#1145 opened a year ago by santha96
2
assert self.has_full_params
#1134 opened a year ago by pokameng
4
Hybrid Sharding in Fairscale's FSDP Implementation
#1133 opened a year ago by stephanpeitz
2
Why ShardedDDP and OSS are slower than Vanilla DDP
#1131 opened a year ago by powermano
0
pip install failed
#1130 opened a year ago by dogxxxxx
0
Error with nested models "Caffe2 uses a lazy allocation..."
#1129 opened a year ago by Emanuele97x
0
[bug] pip package 0.4.13 fails to build wheel
#1128 opened a year ago by project-tuva
0
Error Freezing Weights
#1126 opened 2 years ago by mostafaelhoushi
0
Compatibility with Pytorch 2.0; failing test `test_gradient_value`
#1124 opened 2 years ago by h-vetinari
4
Can exclude some layer parameter not to shard?
#1123 opened 2 years ago by robotcator
5
Unexpected Large Memory Consumption during Tensor Parallelism Training with OPT-1.3B
#1111 opened 2 years ago by dangxingyu
5
All parameters cannot be shared amongst 2 different FSDP modules
#1117 opened 2 years ago by sarthakgarg
1
FSDP on model that has requires_grad = false
#1119 opened 2 years ago by andrasiani
1
[AdaScale] self._hook() failure in __init__() of AdaScale() class
#1114 opened 2 years ago by connieKing511
1
Whether modifying the source code (fully_sharded_data_parallel.py) will bring safety hazard?
#1115 opened 2 years ago by dropreg
2
Combine powersgd with fairscale
#1113 opened 2 years ago by amsword
1
memory explodes after self._rebuild_full_params() function
#1112 opened 2 years ago by haorannlp
0
Lots of Commandline Output from this line.
#1107 opened 2 years ago by jstraub
1
[FSDP] Training gets slower as iterations increase when flatten_parameters=False?
#1102 opened 2 years ago by woodyx218
10
[FSDP] How to use customized backward hooks?
#1101 opened 2 years ago by woodyx218
25
FSDP cannot consolidate optimizer state dict with flatten params is False
#1100 opened 2 years ago by ShenglongZ
3
Skip rather than fail tests in absence of `fair_dev`
#1096 opened 2 years ago by h-vetinari
3
FSDP - Extra GPU memory consumption when maintaining a EMA weights
#1093 opened 2 years ago by syorami
5
Any examples using AdaScale with fairseq?
#1094 opened 2 years ago by kedarkolluri
1
clip_grad_norm_ from fairscale downcasts to bf16 before all reduce
#1092 opened 2 years ago by glample
3
[Question] FSDP vs ZeRO
#1090 opened 2 years ago by wookjeHan
1
Why I cannot set move_params_to_cpu=True
#1089 opened 2 years ago by xdd12135
3
Why only wait for work_handles[-1] in _sync_params_and_buffers ?
#1088 opened 2 years ago by shijungg
5
AdaScale with gradient accumulation: Sum or average of gradients?
#1062 opened 2 years ago by lballes
11
Question: VRAM Limits for FSDP across Asymmetrical GPUs?
#1073 opened 2 years ago by zaptrem
4
Can't load optimizer state due to `state_steps`
#1083 opened 2 years ago by rowhanm
10
is the prefetch_fsdp_params_simple branch deleted or merged?
#1082 opened 2 years ago by hyoo
1
FSDP Forward order differs from that of first run
#1056 opened 2 years ago by Dahoas
3
FSDP consumes the same amount of memory as DPP，why？
#1050 opened 2 years ago by Alex-Songs
11
Inconsistence in checkpoint-wrapper when wrapped BN block
#1059 opened 2 years ago by yuctian
2
Pytorch nightly build failing
#1057 opened 2 years ago by darren-pei
8
How can I use only OSS in Pytorch lightning?
#1053 opened 2 years ago by CD21a
1
AssertionError: expects all parameters to have same requires_grad
#1047 opened 2 years ago by boundles
2
[Defect in 0.4.7] pygit2/pgzip not automatically installed.
#1042 opened 2 years ago by alexandervaneck
4