Issues
- 0
- 2
使用fp16如何加载训练之后保存的模型动量呢
#56 opened by xealml - 4
- 5
如何检查模型是否加载成功?
#53 opened by Tron1994 - 2
cpm-large 的预训练动量是否会开源呢?
#50 opened by yayaQAQ - 1
模型问题
#43 opened by Chunhui-Zou - 0
请问使用2张卡保存的模型有2个,最终使用时使用哪个
#55 opened by xealml - 0
训好的模型如何转化成huggingface的模型格式呢
#54 opened by Tron1994 - 6
- 2
- 1
能直接加载huggingface中的CPM-Distill模型吗
#42 opened by zhoucz97 - 2
跑CPM-large对显存要求是多少,我用一张24G的3090跑不出来
#47 opened by Chunhui-Zou - 1
关于Zero-shot 和 Finetune 模式下 Acc 计算问题
#35 opened by lulu51230 - 2
[deepspeed] fp16 dynamic loss scale overflow!
#28 opened by 520jefferson - 3
多卡finetune时的Bug
#34 opened by xiaofei05 - 1
这个框架支持pipeline并行吗?
#49 opened by yayaQAQ - 3
使用基于STC数据集修改的代码跑问题生成
#24 opened by LaVineChan - 1
请问CPM-1预训练的时候是训练1024个token吗
#51 opened by orlando1986 - 2
RuntimeWarning: overflow encountered in exp
#27 opened by 520jefferson - 0
关于模型问题
#46 opened by Chunhui-Zou - 0
embedding average计算中,词向量使用的是哪个呢?如何进行分词的呢?STC_test中ground truth存在中英文的情况,这种情况如何进行分词呢?
#48 opened by allyouneeds - 10
在ChID数据集上微调CPM-large模型准确率远低于论文结果
#11 opened by keezen - 1
- 0
请教
#44 opened by Chunhui-Zou - 1
STC数据集finetune时报错
#40 opened by David-Li0406 - 9
- 2
- 1
字典token的扩展
#37 opened by Hansen06 - 1
- 1
- 0
微调结果
#33 opened by zhenhao-huang - 1
RuntimeError: cuda runtime error (10)
#36 opened by drxmy - 2
关于微调超长文本和生成结果的问题
#31 opened by zhenhao-huang - 4
TypeError: 'NoneType' object is not subscriptable
#26 opened by yiyele - 6
用fp32精度微调文本生成模型不收敛
#20 opened by zmingshi - 4
[question] cand_ids变量的来源?
#29 opened by starkhu - 2
多卡多机,building model时间很长
#25 opened by demomagic - 16
在加载CPM模型(26亿参数)的情况下,修改微调参数减小显存占用
#12 opened by zhenhao-huang - 3
CHID数据集 finetune_chid_large_fp32.sh报错
#21 opened by YinWei123 - 2
RuntimeError: CUDA error: initialization error
#23 opened by holalula - 1
关于finetune_lm损失函数的问题
#22 opened by mali19064 - 24
关于文本生成模板的合理性
#18 opened by zhenhao-huang - 8
finetune_chid.py里面193~195行关于scores = torch.stack(tensor_list, 0).view(-1, 15000) 的含义?
#19 opened by lulu51230 - 3
文本转id问题
#15 opened by zhenhao-huang - 8
用fp32精度微调生成的模型过大
#16 opened by zhenhao-huang - 12
执行change_mp.py文件将模型由2块分成4块后,使用4块GPU加载分成4块的模型,报错
#17 opened by lulu51230 - 1
请问这个可以在单GPU上运行吗
#14 opened by unbuilt - 5
将模型切成4份后,第0个进程load错误
#13 opened by lulu51230 - 1
在ChID数据集运行scripts/finetune_chid_large.sh报错
#9 opened by keezen - 2
请问为什么微调代码里面没有model.zero_grad呢?难道不需要清空梯度吗?
#8 opened by keezen