Issues
- 1
如果想应用更多的模型,例如InstructBLIP,请问应该怎么修改呢?
#181 opened by Listever - 1
- 1
希望能支持 safetensors 格式的权重
#157 opened by WillQvQ - 2
trainer.py文件中保存peft_config时会出错
#159 opened by Mr-nnng - 11
关于增加千问模型的支持
#153 opened by Jieni05 - 2
No log information
#163 opened by BeastyZ - 2
The interpetation about the transposition operation when spliting weight to tensor parallel group
#154 opened by SparkJiao - 1
- 1
能否增加一个从头预训练的例子?
#151 opened by liujuncn - 2
LOMO优化器使用梯度裁剪导致训练时间翻倍?
#150 opened by Jieni05 - 2
Evaluating is too slow
#121 opened by JinchaoLove - 6
chatGLM2 使用张量并行报错
#135 opened by BlueSkyyyyyy - 6
关于模型中断,重启的问题,怎么让模型继续训练
#142 opened by 459737087 - 0
- 2
你好,怎么让保存的模型能够分片,而不是保存一个几十G的大模型
#143 opened by 459737087 - 3
- 5
RendezvousConnectionError,跑着跑着就有这个报错
#141 opened by 459737087 - 2
- 1
关于 adalomo 没有 loss_scaler 只有 loss_scale 的问题
#139 opened by HappyLynn - 6
训练loss为NaN
#107 opened by fuqianya - 11
使用的Megatron-LM的版本
#126 opened by liaosnow - 3
是否可以新增chatglm3 支持?
#134 opened by hijeffwu - 2
A100单卡跑llama2 finetune lora报错oom
#138 opened by JiafeiSun - 2
No module named 'collie.callbacks.pefts'
#137 opened by JiafeiSun - 2
chatGLM2 好像目前不支持ptuning训练,有计划什么时候支持么
#136 opened by BlueSkyyyyyy - 1
__init__() missing 'init_method' and 'config'
#133 opened by yueg-security - 3
AdaLomo optimizer step method
#132 opened by winglian - 1
[BUG] Evaluation 时使用并行可能不会完整地遍历一遍数据
#119 opened by KYLN24 - 3
张量并行流水并行可以和lora一起使用么?报错ValueError: Target module ColumnParallelLinearWithoutBias() is not supported. Currently, only `torch.nn.Linear` and `Conv1D` are supported.
#131 opened by BlueSkyyyyyy - 3
- 3
collie和lomo不兼容
#100 opened by LZY-the-boys - 1
使用数据类_ShardContainer遇到错误
#123 opened by xuguohai - 7
- 3
该项目能否用于对模型进行二次预训练
#120 opened by Zheng-Jay - 1
[Feature] examples 里是否可以新增一个 internLM的用例?
#124 opened by wuchangping - 3
- 3
- 0
- 5
- 0
Could Lomo class support `param_groups`?
#117 opened by JinchaoLove - 2
[问题]有关训练可视化
#113 opened by RickMeow - 2
[QUESTION]Multi-node multi-gpu training
#110 opened by RickMeow - 2
- 5
- 12
替换tokenizer后载入报错
#105 opened by 2793145003 - 1
lr_scheduler设置的问题
#106 opened by YuxiangZhang0114 - 3
Llama2 70B 训练报错
#104 opened by xiaopqr - 2
训练出错但没有报错信息
#102 opened by 2793145003 - 1
tensor parallel + zero3 error
#99 opened by LZY-the-boys - 2
V100上执行examples/alpaca/train.py碰到错误No module named 'petrel_client,请问有人知道怎么解决吗
#94 opened by JiafeiSun