bigscience-workshop/Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

PythonNOASSERTION

Issues

hello， I meet a problem
#386 opened a year ago by etoilestar
8
How can I set recomputation-granularity,like selective or full?
#403 opened 2 months ago by LordEdison
0
Is this assertion for mask wrong?
#400 opened 4 months ago by yinfangchen
1
Hello, what version of the megatron-lm library is your code modified?
#401 opened 4 months ago by 4thGardenOfQMH
0
ModuleNotFoundError: No module named 'torch' when run 'pip install -e .', but pytorch exists
#389 opened a year ago by SeekPoint
1
The given group does not exist pytorch
#379 opened a year ago by germanjke
2
Hello, can Megatron-DeepSpeed pre-train llama2?
#398 opened 8 months ago by 13416157913
0
RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'
#387 opened a year ago by zll0000
1
Cannot run 3D parallelism with tp == 1 dp == 3 pp == 2 degrees
#397 opened 10 months ago by Heelim-Hong
0
ModuleNotFoundError: No module named 'packaging' when install apex
#390 opened 10 months ago by SeekPoint
3
the traing log like this is Normal？ I do not find loss in the logs, and what does the "grad norm: nan" mean?
#396 opened 10 months ago by alphanlp
0
The difference between zero-3 and megatron with zero-2
#395 opened 10 months ago by nicosouth
0
Exception: cuda rng state model-parallel-rng is not added
#369 opened a year ago by 520jefferson
1
Question about the implementation of mpu.cross_entropy when using tensor parallel
#394 opened 10 months ago by robin087
0
pip install -e . failed with ModuleNotFoundError: No module named 'torch'
#383 opened a year ago by SeekPoint
1
questions about inconsistent evaluation result
#392 opened a year ago by coorful
0
stage3 error: IndexError: list index out of range
#391 opened a year ago by PhdShi
1
Finetuning BLOOM
#337 opened 2 years ago by AnaRhisT94
5
Question about ds to universal
#388 opened a year ago by saxh
0
About reshape deepspeed checkpoint
#343 opened 2 years ago by henan991201
20
How to properly use Flops Profiler with pipelined parallelism?
#385 opened a year ago by flyingdown
0
Help me, I'm dying soon，error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error
#382 opened a year ago by listwebit
0
Megatron-DeepSpeed only applies to specific models?
#381 opened a year ago by Bob-cby
0
Universal checkpoints and MP states
#380 opened a year ago by aitorormazabal
2
How to continue pre-training Bloom?
#366 opened a year ago by ShinoharaHare
2
Fatal error: cuda_fp16.h: No such file or directory on ROCm
#360 opened a year ago by lvcc2018
1
upgrade megatron-lm
#378 opened a year ago by dz1iang
0
How can we access to the gradients while the model is training?
#377 opened a year ago by BilgehanSel
0
how to do prompt learning with bloom?
#376 opened a year ago by moseshu
0
deepspeed_to_megatron several issues
#355 opened 2 years ago by MatejUlcar
3
how to frozen some layers of GPT, only fintune last k layers?
#375 opened a year ago by joan126
0
About convert DS checkpoint to Transformers
#333 opened 2 years ago by misska1
2
How to convert model weights(e.g., bigscience/bloomz-560m-optimizer-states) to Hugging Face model.bin file?
#374 opened a year ago by qazwsx042
1
Can I use python only apex for gpt_pretrain?
#373 opened a year ago by Luoyang144
0
how to pretrain t5-lm adapted?
#372 opened a year ago by nanyyyyyy
0
grad norm increase strangely
#347 opened 2 years ago by misska1
12
How to preprocess data for t5 model?
#371 opened a year ago by xiu-ze
0
Load Bloom Optimizer State (i.e. Bloom 1B1)
#350 opened 2 years ago by philippmtk
2
Are there any other layer norm functions, such as RMSNorm or DeepNorm
#364 opened a year ago by lvcc2018
0
Is there any script for pretraining/funting Bloom?
#363 opened a year ago by drxmy
0
Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload?
#361 opened a year ago by drxmy
0
fintuning bloom 176b with bitfit
#359 opened a year ago by drxmy
2
User Warnings for accessing grad attribute of non-leaf Tensors thrown with TP=1 and PP>1
#356 opened 2 years ago by chelseajohn
3
Slower inference results for BLOOM fp16 on identical hardware
#348 opened 2 years ago by sarthaklangde
5
How to inference GPT2 with DeepSpeed?
#346 opened 2 years ago by cdj0311
1
Installing Apex on Windows
#342 opened 2 years ago by gordicaleksa
1
pretrain_gpt_distributed.sh ERROR!
#341 opened 2 years ago by cdj0311
0
About convert deepspeed to deepspeed checkpoint
#338 opened 2 years ago by henan991201
4
Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels
#334 opened 2 years ago by CloudedLeopard17
4
Changing a single example affects forward pass for other examples in a batch
#335 opened 2 years ago by mayank31398
4