epfLLM/Megatron-LLM

distributed trainer for LLMs

PythonNOASSERTION

Issues

Assert in verify_correctness for mistral-7B
#107 opened 3 months ago by abgoswam
3
any plan to open-source the vision version of the model?
#106 opened 4 months ago by Chacha-Chen
0
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256) #81
#103 opened 4 months ago by yushengsu-thu
2
Introduce Sailor Models
#105 opened 6 months ago by longxudou
1
Gemma Support
#104 opened 7 months ago by pedrohenriqueamartins
0
Error in document (https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#data-preprocessing)
#102 opened 8 months ago by yushengsu-thu
0
Does it support sequence parallel?
#101 opened 8 months ago by NamrataRShivagunde
1
Multi nodes
#98 opened 8 months ago by wodeqiansuihan
1
Any plans to rebase the codebase to most recent Megatron-LM for MoE?
#100 opened 9 months ago by xingyaoww
0
Correctness when enabling FlashAttention + Sequence Parallel at the same time?
#99 opened 9 months ago by xingyaoww
2
Support QWen？
#96 opened 10 months ago by Vincent131499
1
How to load from a saved intermediate checkpoint?
#95 opened 10 months ago by jjzha
3
LLaMA2-70B Inference Optmization
#92 opened 10 months ago by RaymondHQR
1
error: preprocess.py file error while working on custom data
#94 opened 10 months ago by toqeer618
0
LLaMa and Mistral 7B pretraining support
#91 opened a year ago by StephennFernandes
2
One question about the permute function code in permute_qkv.py
#89 opened a year ago by drxmy
2
llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)
#81 opened a year ago by 13416157913
1
args.make_vocab_size_divisible_by set failed
#82 opened a year ago by 13416157913
1
Support for Mistral
#76 opened a year ago by philschmid
7
Nice-to-have training features
#30 opened a year ago by andreaskoepf
0
RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED
#80 opened a year ago by 13416157913
4
[Megatron Base Version] Would mind share the based version of Megatron ?
#67 opened a year ago by dumpmemory
7
run finetune llama2-7B error
#77 opened a year ago by 13416157913
2
finetune llama2-7B when set --seq_length 4096 error
#79 opened a year ago by 13416157913
1
run finetune llama2-7B error
#78 opened a year ago by 13416157913
1
Prepend bos token
#54 opened a year ago by panx27
1
[Swiglu] question about swiglu
#64 opened a year ago by mynewstart
6
Feature Request: Can we directly use the huggingface dataset for training
#65 opened a year ago by dumpmemory
4
Loading weights from hf conversion with different TP,PP settings
#63 opened a year ago by binwang777
14
Getting started "shard" model not working
#70 opened a year ago by philschmid
9
RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)
#73 opened a year ago by liuxm117
2
[Save checkpoint needs long time]
#69 opened a year ago by mynewstart
2
support falcon 180B
#71 opened a year ago by martinjaggi
0
iteration-time increases linearly when micro_batch_size=1
#60 opened a year ago by LlinWing
1
iteration-time increases linearly (for TP=2, PP=1 & TP=1, PP=2)
#22 opened a year ago by andreaskoepf
8
convert huggingface model to megatron. "Only llama v2 available using huggingface"
#49 opened a year ago by uygnef
1
llama2 & vocabulary padding (making embedding layer sizes divisible by 128)
#50 opened a year ago by andreaskoepf
1
Add update_to_hub docs
#39 opened a year ago by AleHD
0
dose 8 A100 80g enough to finetune 70b llama2 ?
#52 opened a year ago by james2v
5
how to convert baichuan-13b into megatron weights?
#32 opened a year ago by wwngh1233
3
Convert LLama-30B to Megatron Error
#43 opened a year ago by dumpmemory
1
add GQA(MQA) support in megatron2hf conversion
#24 opened a year ago by Olivia-fsm
0
Generate HuggingFace tokenizer configuration as part of megatron2hf.py (weight conversion)
#19 opened a year ago by andreaskoepf
2
Add falcon support in megatron2hf.py
#28 opened a year ago by AleHD
4
Passed position_ids are ignored for `PositionEmbeddingType.rotary`
#23 opened a year ago by andreaskoepf
1
NaN detection possibly ineffective
#33 opened a year ago by andreaskoepf
0
convert_llama2hf.py should be replaced with newer version
#17 opened a year ago by andreaskoepf
3
cuda misaligned address in pretrain llama2 7B
#18 opened a year ago by pwq1989
2
The training speed is two times slower than the Megatron-LM and Megatron-Deepspeed
#11 opened a year ago by zhao1iang
5
Error during merge of sharded checkpoint: 'TransformerLanguageModel' object has no attribute 'lm_head'
#14 opened a year ago by andreaskoepf
1