bigscience-workshop/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
PythonNOASSERTION
Issues
- 8
hello, I meet a problem
#386 opened by etoilestar - 0
- 1
Is this assertion for mask wrong?
#400 opened by yinfangchen - 0
- 1
ModuleNotFoundError: No module named 'torch' when run 'pip install -e .', but pytorch exists
#389 opened by SeekPoint - 2
The given group does not exist pytorch
#379 opened by germanjke - 0
Hello, can Megatron-DeepSpeed pre-train llama2?
#398 opened by 13416157913 - 1
RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'
#387 opened by zll0000 - 0
- 3
- 0
the traing log like this is Normal? I do not find loss in the logs, and what does the "grad norm: nan" mean?
#396 opened by alphanlp - 0
- 1
- 0
Question about the implementation of mpu.cross_entropy when using tensor parallel
#394 opened by robin087 - 1
- 0
questions about inconsistent evaluation result
#392 opened by coorful - 1
stage3 error: IndexError: list index out of range
#391 opened by PhdShi - 5
Finetuning BLOOM
#337 opened by AnaRhisT94 - 0
Question about ds to universal
#388 opened by saxh - 20
About reshape deepspeed checkpoint
#343 opened by henan991201 - 0
- 0
Help me, I'm dying soon,error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error
#382 opened by listwebit - 0
Megatron-DeepSpeed only applies to specific models?
#381 opened by Bob-cby - 2
Universal checkpoints and MP states
#380 opened by aitorormazabal - 2
How to continue pre-training Bloom?
#366 opened by ShinoharaHare - 1
- 0
upgrade megatron-lm
#378 opened by dz1iang - 0
- 0
how to do prompt learning with bloom?
#376 opened by moseshu - 3
deepspeed_to_megatron several issues
#355 opened by MatejUlcar - 0
- 2
About convert DS checkpoint to Transformers
#333 opened by misska1 - 1
How to convert model weights(e.g., bigscience/bloomz-560m-optimizer-states) to Hugging Face model.bin file?
#374 opened by qazwsx042 - 0
Can I use python only apex for gpt_pretrain?
#373 opened by Luoyang144 - 0
how to pretrain t5-lm adapted?
#372 opened by nanyyyyyy - 12
grad norm increase strangely
#347 opened by misska1 - 0
How to preprocess data for t5 model?
#371 opened by xiu-ze - 2
Load Bloom Optimizer State (i.e. Bloom 1B1)
#350 opened by philippmtk - 0
- 0
Is there any script for pretraining/funting Bloom?
#363 opened by drxmy - 0
- 2
fintuning bloom 176b with bitfit
#359 opened by drxmy - 3
User Warnings for accessing grad attribute of non-leaf Tensors thrown with TP=1 and PP>1
#356 opened by chelseajohn - 5
- 1
How to inference GPT2 with DeepSpeed?
#346 opened by cdj0311 - 1
Installing Apex on Windows
#342 opened by gordicaleksa - 0
pretrain_gpt_distributed.sh ERROR!
#341 opened by cdj0311 - 4
About convert deepspeed to deepspeed checkpoint
#338 opened by henan991201 - 4
Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels
#334 opened by CloudedLeopard17 - 4
Changing a single example affects forward pass for other examples in a batch
#335 opened by mayank31398