训练模型报错RuntimeError: Tensor on device meta is not on the expected device cuda:0!

Question

训练模型报错RuntimeError: Tensor on device meta is not on the expected device cuda:0!

HeartyHaven opened this issue a year ago · 1 comments

运行如下代码的时候报错：
keras_model = KerasModel(model,loss_fn = None,
optimizer=torch.optim.AdamW(model.parameters(),lr=2e-6))
ckpt_path = 'CGEC_chatglm2'
keras_model.fit(train_data = dl_train,
val_data = dl_val,
epochs=100,patience=5,
monitor='val_loss',mode='min',
ckpt_path = ckpt_path,
mixed_precision='fp16'
)
然后调试期间尝试keras_model=keras_model.to('cuda')报错NotImplementedError: Cannot copy out of meta tensor; no data!不知道二者是否有关联

Answer 1 · 2023-09-01T11:14:35.000Z

THUDM/ChatGLM2-6B#204

看这个链接作者的回复：「你是keras，对于 keras或tensorflow 仅添加os.environ['CUDA_VISIBLE_DEVICES'] = "0"即可。其他错误应该是代码问题，建议你先创建model, 再加载权重，编译model后再fit，遵循一般keras模型的研发流程。」

就差直接说某代码流程不规范啦哈哈