alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
PythonApache-2.0
Issues
- 2
llava run error
#330 opened by yangzhipeng1108 - 2
deepseek模型转换问题
#327 opened by bao-xiaoyi - 2
TypeError: get_cpu_offload_context() missing 1 required positional argument: 'weight_offloading'
#324 opened by ben-8878 - 2
关于使用idxmap格式finetune qwen2
#319 opened by Gloid59 - 2
qwen2-sft 训练起步阶段就卡住
#325 opened by baisechundu - 1
Qwen2 0.5B 和 1.5B的模型是否应该将这个参数去掉?
#296 opened by MrWaterZhou - 1
数据预处理的脚本能在mac上运行吗,无法编译
#276 opened by shine10076 - 2
OSError: [Errno 28] No space left on device 请教
#302 opened by shyzzz521 - 3
Mcore是不支持pp吗?
#312 opened by divisionblur - 3
starcoder依赖哪个版本的megatron-lm?
#314 opened by bao-xiaoyi - 1
Channel Loss支持
#316 opened by echo-valor - 1
断点续训问题
#318 opened by divisionblur - 1
mmap数据格式问题
#320 opened by bao-xiaoyi - 1
安装pyarrow失败
#321 opened by xiaoquanWu - 2
mcore 权重转换不支持pp>1
#322 opened by xs1997zju - 1
使用flash-attn训练Qwen1.5 1.8B 加速效果不明显
#323 opened by coder-wangzhen - 1
- 3
QwenTokenizer与Qwen2Tokenizer
#295 opened by sexan - 0
保存的checkpoints中缺少distrib_optim.pt
#315 opened by shizikachen - 5
钉钉群满了
#304 opened by divisionblur - 3
seq len开大时,初始loss不正常
#300 opened by Jayce1kk - 1
是否支持sharegpt格式数据?或者带"history"字段的多轮对话数据?
#306 opened by jiejie1993 - 1
Flash-Attn 3的支持
#308 opened by echo-valor - 3
- 1
optimizer offloading 太强了
#311 opened by 154912369 - 2
打扰了,提个关于多机训练的issues
#307 opened by CallmeZhangChenchen - 5
Missing key(s) in state_dict llama3 mcore转换后权重不匹配
#303 opened by wuduher - 2
bigcode-evaluation-harness 这个仓库应该是没有了
#301 opened by CallmeZhangChenchen - 0
[rank31]: OSError: error stat()ing file 数据集map问题
#305 opened by shyzzz521 - 1
qwen-moe-megablocks权重转换问题
#282 opened by yingzhao27 - 1
nvcr.io/nvidia/pytorch:23.12-py3镜像包冲突
#294 opened by wuduher - 6
qwen2-7b problem when tp=2, pp=1
#285 opened by MrWaterZhou - 2
模型转换显存占用问题
#287 opened by coder-wangzhen - 0
qwen2 MG and HF mismatch
#289 opened by vlad-karpuhin - 3
- 0
[BUG] `layer_number` 参数无法解析
#284 opened by cingtiye - 1
Qwen1.5 SFT阶段用的数据格式是LLama-Pretrain-Raw?
#278 opened by cocaer - 0
- 0
请问Qwen2是否支持Sparse Upcycling的方式将dense 转换为moe
#277 opened by zTaoplus - 5
qwen2 72b state_dict mismatch with TE
#271 opened by getao - 1
Phi 系列模型支持
#248 opened by JiwenJ - 1
请问要在自己的集群上跑pai-megatron框架,环境配置需要自己配哪些东西?有相关步骤可以参考么?
#249 opened by qibao77 - 2
[QUESTION] sft阶段数据长短不一,大量padding如何高效训练。
#251 opened by oymzysmwe224 - 2
- 0
- 0
hf2mcore_qwen1.5_dense_mha_to_moe.py 有个逻辑不太懂的地方
#261 opened by steins048596 - 1
- 2
qwen2.0用mcore跑的时候,有两个问题
#256 opened by 154912369 - 2
支持Qwen2-72B?
#250 opened by Crystalxd - 3
deepseek-v2实现的有问题,不能支持tp>1的情况
#255 opened by 154912369