Issues
- 3
Assert in verify_correctness for mistral-7B
#107 opened by abgoswam - 0
- 2
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256) #81
#103 opened by yushengsu-thu - 1
Introduce Sailor Models
#105 opened by longxudou - 0
Gemma Support
#104 opened by pedrohenriqueamartins - 0
Error in document (https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#data-preprocessing)
#102 opened by yushengsu-thu - 1
Does it support sequence parallel?
#101 opened by NamrataRShivagunde - 1
Multi nodes
#98 opened by wodeqiansuihan - 0
- 2
Correctness when enabling FlashAttention + Sequence Parallel at the same time?
#99 opened by xingyaoww - 1
Support QWen?
#96 opened by Vincent131499 - 3
How to load from a saved intermediate checkpoint?
#95 opened by jjzha - 1
LLaMA2-70B Inference Optmization
#92 opened by RaymondHQR - 0
- 2
LLaMa and Mistral 7B pretraining support
#91 opened by StephennFernandes - 2
- 1
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)
#81 opened by 13416157913 - 1
args.make_vocab_size_divisible_by set failed
#82 opened by 13416157913 - 7
Support for Mistral
#76 opened by philschmid - 0
Nice-to-have training features
#30 opened by andreaskoepf - 4
- 7
- 2
run finetune llama2-7B error
#77 opened by 13416157913 - 1
- 1
run finetune llama2-7B error
#78 opened by 13416157913 - 1
Prepend bos token
#54 opened by panx27 - 6
[Swiglu] question about swiglu
#64 opened by mynewstart - 4
- 14
- 9
Getting started "shard" model not working
#70 opened by philschmid - 2
RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)
#73 opened by liuxm117 - 2
[Save checkpoint needs long time]
#69 opened by mynewstart - 0
support falcon 180B
#71 opened by martinjaggi - 1
- 8
- 1
convert huggingface model to megatron. "Only llama v2 available using huggingface"
#49 opened by uygnef - 1
llama2 & vocabulary padding (making embedding layer sizes divisible by 128)
#50 opened by andreaskoepf - 0
Add update_to_hub docs
#39 opened by AleHD - 5
dose 8 A100 80g enough to finetune 70b llama2 ?
#52 opened by james2v - 3
- 1
Convert LLama-30B to Megatron Error
#43 opened by dumpmemory - 0
add GQA(MQA) support in megatron2hf conversion
#24 opened by Olivia-fsm - 2
Generate HuggingFace tokenizer configuration as part of megatron2hf.py (weight conversion)
#19 opened by andreaskoepf - 4
Add falcon support in megatron2hf.py
#28 opened by AleHD - 1
- 0
NaN detection possibly ineffective
#33 opened by andreaskoepf - 3
- 2
cuda misaligned address in pretrain llama2 7B
#18 opened by pwq1989 - 5
The training speed is two times slower than the Megatron-LM and Megatron-Deepspeed
#11 opened by zhao1iang - 1
Error during merge of sharded checkpoint: 'TransformerLanguageModel' object has no attribute 'lm_head'
#14 opened by andreaskoepf