There is a training process and GPU memory usage, but the GPU is not working.
Opened this issue · 2 comments
yinkaaiwu commented
Hello, here is my code:
ml_potential = FinetunerCalc(
checkpoint_path="gemnet_t_direct_h512_all.pt",
mlp_params={
"tuner": {
"unfreeze_blocks": [
"out_blocks.3.seq_forces",
"out_blocks.3.scale_rbf_F",
"out_blocks.3.dense_rbf_F",
"out_blocks.3.out_forces",
"out_blocks.2.seq_forces",
"out_blocks.2.scale_rbf_F",
"out_blocks.2.dense_rbf_F",
"out_blocks.2.out_forces",
"out_blocks.1.seq_forces",
"out_blocks.1.scale_rbf_F",
"out_blocks.1.dense_rbf_F",
"out_blocks.1.out_forces",
],
"num_threads": 32
},
"optim": {
"batch_size": 1,
"num_workers": 0,
"max_epochs": 400,
"lr_initial": 0.0003,
"factor": 0.9,
"eval_every": 1,
"patience": 3,
"checkpoint_every": 100000,
"scheduler_loss": "train",
"weight_decay": 0,
"eps": 1e-8,
"optimizer_params": {
"weight_decay": 0,
"eps": 1e-8,
},
},
"task": {
"primary_metric": "loss",
},
"local_rank": 0
},
)
ml_potential.train(parent_dataset=train_dataset[:2])
my cuda version is 11.3, nvidia-smi can see the training process and GPU memory usage, but the volatile gpu-util is 0, and the power consumption has not increased. Is there a problem with my parameter settings?
jiaozihao18 commented
@yinkaai maybe can try add "cpu":False in mlp_params dict. (ref: update oal example for gpu usage #36)
yinkaaiwu commented
@yinkaai maybe can try add "cpu":False in mlp_params dict. (ref: update oal example for gpu usage #36)
thank you!