DK-Jang/motion_puzzle

errors occur when loading the pretrained model

Opened this issue · 2 comments

Hi @DK-Jang ! Thank you for sharing this nice job :-)
I meet some troubles when loading the pre-trained model and pytorch maps the location to a device as the code described below:

self.device = torch.cuda.current_device()

self.device = torch.cuda.current_device()

For my case it returns 0 and trigger an error:

TypeError: 'int' object is not callable. '

When I modify this line to self.device = torch.device('cuda:0') the error message changes to

RuntimeError: Error(s) in loading state_dict for DataParallel:
	Missing key(s) in state_dict: "module.enc_content.edge_importance_j", ...
	Unexpected key(s) in state_dict: "enc_content.edge_importance_j", ...

I think this is because the model is trained and saved in a parallel approach, however it is impossible for me to run on multiple GPUs.

Please offer me a help, thanks ahead!

I faced the same problem.
I think that pretrained_network trained with no data parallel (or use cpu).
This worked by modifying the code as follows:

In trainer.py

from collections import OrderedDict

self.device = torch.cuda.current_device()

self.device = torch.device("cuda:{}".format(torch.cuda.current_device()))

motion_puzzle/trainer.py

Lines 163 to 164 in 7f1eca9

self.gen.load_state_dict(state_dict['gen'])
self.gen_ema.load_state_dict(state_dict['gen_ema'])

        gen_dict = OrderedDict()
        for key, value in state_dict["gen"].items():
            if not key.startswith("module."):
                key = "module." + key
            gen_dict[key] = value
        self.gen.load_state_dict(gen_dict)
        gen_ema_dict = OrderedDict()
        for key, value in state_dict["gen_ema"].items():
            if not key.startswith("module."):
                key = "module." + key
            gen_ema_dict[key] = value
        self.gen_ema.load_state_dict(gen_ema_dict)

In test.py

motion_puzzle/test.py

Lines 128 to 131 in 7f1eca9

rec = rec.numpy()*std + mean
tra = tra.numpy()*std + mean
con_gt = con_gt.numpy()*std + mean
sty_gt = sty_gt.numpy()*std + mean

        rec = rec.cpu().numpy()*std + mean
        tra = tra.cpu().numpy()*std + mean
        con_gt = con_gt.cpu().numpy()*std + mean
        sty_gt = sty_gt.cpu().numpy()*std + mean

If you want to retrain this work, these changes must be erased.

I have the same problem. The way I tried:

Change this part to:

motion_puzzle/trainer.py

Lines 34 to 35 in 52af967

self.gen = nn.DataParallel(self.gen).to(self.device)
self.gen_ema = nn.DataParallel(self.gen_ema).to(self.device)

self.gen = self.gen.to(self.device)
self.gen_ema = self.gen_ema.to(self.device)

And:

state_dict = torch.load(model_path, map_location=self.device)

state_dict = torch.load(model_path, map_location="cuda:0")

And do the same thing in test.py in KosukeFukazawa's thread.