[Checkpoint Loading] Issues on loading s3dis ckpt weights when running 'demo_s3dis.ipynb'

Question

[Checkpoint Loading] Issues on loading s3dis ckpt weights when running 'demo_s3dis.ipynb'

Closed this issue a year ago · 14 comments

Hi, thanks again for your great project.
I have encountered a problem while attempting load pre-trained checkpoint file spt-2_s3dis_fold5.ckpt when I run demo_s3dis.ipynb.
The detailed error report is as follows:

AttributeError                            Traceback (most recent call last)
Cell In[11], line 10
      7 model = hydra.utils.instantiate(cfg.model)
      9 # Load pretrained weights from a checkpoint file
---> 10 model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
     11 model.criterion = hydra.utils.instantiate(cfg.model).criterion
     12 model = model.eval().cuda()

File /home/pai/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/module.py:1520, in LightningModule.load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
   1440 @classmethod
   1441 def load_from_checkpoint(
   1442     cls,
   (...)
   1447     **kwargs: Any,
   1448 ) -> Self:
   1449     r"""
   1450     Primary way of loading a model from a checkpoint. When Lightning saves a checkpoint
   1451     it stores the arguments passed to ``__init__``  in the checkpoint under ``"hyper_parameters"``.
   (...)
   1518         y_hat = pretrained_model(x)
   1519     """
-> 1520     loaded = _load_from_checkpoint(
   1521         cls,
   1522         checkpoint_path,
   1523         map_location,
   1524         hparams_file,
   1525         strict,
   1526         **kwargs,
   1527     )
   1528     return cast(Self, loaded)

File /home/pai/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:90, in _load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
     88     return _load_state(cls, checkpoint, **kwargs)
     89 if issubclass(cls, pl.LightningModule):
---> 90     model = _load_state(cls, checkpoint, strict=strict, **kwargs)
     91     state_dict = checkpoint["state_dict"]
     92     if not state_dict:

File /home/pai/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:156, in _load_state(cls, checkpoint, strict, **cls_kwargs_new)
    154 # load the state_dict on the model automatically
    155 assert strict is not None
--> 156 keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
    158 if not strict:
    159     if keys.missing_keys:

File /mnt/user/superpoint_transformer/src/models/segmentation.py:551, in PointSegmentationModule.load_state_dict(self, state_dict, strict)
    549 # Special treatment for MultiLoss
    550 if self.multi_stage_loss:
--> 551     class_weight_bckp = self.criterion.weight
    552     self.criterion.weight = None
    554     # Recover the class weights from any 'criterion.weight' or
    555     # 'criterion.*.weight' key and remove those keys from the
    556     # state_dict

File /home/pai/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py:1614, in Module.__getattr__(self, name)
   1612     if name in modules:
   1613         return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
   1615     type(self).__name__, name))

AttributeError: 'MultiLoss' object has no attribute 'weight'

Answer 1 · 2023-07-31T03:17:41.000Z

Another Information: I have tried to run the eval process using this checkpoint successfully, so it might not be the problem in ckpt or dataset.

Answer 2 · 2023-07-31T03:26:34.000Z

hi，Do you know how to obtain the processed dataset

Answer 3 · 2023-07-31T03:44:13.000Z

hi，Do you know how to obtain the processed dataset

Hi, as my experience, if you have download the full dataset, like below, and you run eval.py as indicated in readme.md, then the script will automatically conduct the processed dataset first.

└── data
    └── s3dis                                                     # Structure for S3DIS
        ├── Stanford3dDataset_v1.2.zip                              # (optional) Downloaded zipped dataset with non-aligned rooms
        ├── raw                                                     # Raw dataset files
        │   └── Area_{{1, 2, 3, 4, 5, 6}}                             # S3DIS's area/room/room.txt structure
        │       └── Area_{{1, 2, 3, 4, 5, 6}}_alignmentAngle.txt        # Room alignment angles required for entire floor reconstruction
        │       └── {{room_name}}  
        │           └── {{room_name}}.txt

Answer 4 · 2023-07-31T04:29:07.000Z

Hi @pynsigrid, thanks for using this project !

The error you are encountering seems related to #12. I had solved it but then recently made another change that might cause this. I will look into this and let you know soon.

Answer 5 · 2023-07-31T05:57:24.000Z

Hi @pynsigrid things to work fine on my end. Are you using the latest version of the project ?

In particular, I made some changes in this commit which could solve your problem.

Answer 6 · 2023-07-31T11:58:15.000Z

Hi @pynsigrid things to work fine on my end. Are you using the latest version of the project ?

In particular, I made some changes in this commit which could solve your problem.

Hi @drprojects, YES, I just re-clone this repo and install environment again. But the same error occurs again.
Some Information of this experiment:
notebook: demo_s3dis.ipynb
ckpt: spt-2_s3dis_fold{2 to 6}.ckpt
error massage:

Lightning automatically upgraded your loaded checkpoint from v1.8.0 to v2.0.6. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../superpoint_transformer_0728/checkpoints/spt-2_s3dis_fold6.ckpt`
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 10
      7 model = hydra.utils.instantiate(cfg.model)
      9 # Load pretrained weights from a checkpoint file
---> 10 model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
     11 # model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
     12 model.criterion = hydra.utils.instantiate(cfg.model).criterion

File /home/pai/envs/spt2/lib/python3.8/site-packages/pytorch_lightning/core/module.py:1520, in LightningModule.load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
   1440 @classmethod
   1441 def load_from_checkpoint(
   1442     cls,
   (...)
   1447     **kwargs: Any,
   1448 ) -> Self:
   1449     r"""
   1450     Primary way of loading a model from a checkpoint. When Lightning saves a checkpoint
   1451     it stores the arguments passed to ``__init__``  in the checkpoint under ``"hyper_parameters"``.
   (...)
   1518         y_hat = pretrained_model(x)
   1519     """
-> 1520     loaded = _load_from_checkpoint(
   1521         cls,
   1522         checkpoint_path,
   1523         map_location,
   1524         hparams_file,
   1525         strict,
   1526         **kwargs,
   1527     )
   1528     return cast(Self, loaded)

File /home/pai/envs/spt2/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:90, in _load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
     88     return _load_state(cls, checkpoint, **kwargs)
     89 if issubclass(cls, pl.LightningModule):
---> 90     model = _load_state(cls, checkpoint, strict=strict, **kwargs)
     91     state_dict = checkpoint["state_dict"]
     92     if not state_dict:

File /home/pai/envs/spt2/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:156, in _load_state(cls, checkpoint, strict, **cls_kwargs_new)
    154 # load the state_dict on the model automatically
    155 assert strict is not None
--> 156 keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
    158 if not strict:
    159     if keys.missing_keys:

File /mnt/user/superpoint_transformer/src/models/segmentation.py:551, in PointSegmentationModule.load_state_dict(self, state_dict, strict)
    549 # Special treatment for MultiLoss
    550 if self.multi_stage_loss:
--> 551     class_weight_bckp = self.criterion.weight
    552     self.criterion.weight = None
    554     # Recover the class weights from any 'criterion.weight' or
    555     # 'criterion.*.weight' key and remove those keys from the
    556     # state_dict

File /home/pai/envs/spt2/lib/python3.8/site-packages/torch/nn/modules/module.py:1614, in Module.__getattr__(self, name)
   1612     if name in modules:
   1613         return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
   1615     type(self).__name__, name))

AttributeError: 'MultiLoss' object has no attribute 'weight'

while changing ckpt to fold1, the error massage changes:
ckpt: spt-2_s3dis_fold1.ckpt
error massage:

RuntimeError: [enforce fail at inline_container.cc:257] . file in archive is not in a subdirectory archive/: spt-2_dales.ckpt

Answer 7 · 2023-08-01T20:12:28.000Z

Hi @pynsigrid

while changing ckpt to fold1, the error massage changes:
ckpt: spt-2_s3dis_fold1.ckpt
error massage:

RuntimeError: [enforce fail at inline_container.cc:257] . file in archive is not in a subdirectory archive/: spt-2_dales.ckpt

Good catch ! It seems this file was corrupt and it is my mistake, I have to apologize... I contacted the server administrators to update the zeonodo record. I will let you know when it is fixed.

Other than the spt-2_s3dis_fold1.ckpt file which indeed has a problem, I could successfully run demo_s3dis.ipynb, demo_kitti360.ipynb, and demo_dales.ipynb with all the other checkpoints provided in the zeonodo record.

So, if you are certain you are using the latest version of the code, maybe this has something to do with the warning message you have:

Lightning automatically upgraded your loaded checkpoint from v1.8.0 to v2.0.6. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../superpoint_transformer_0728/checkpoints/spt-2_s3dis_fold6.ckpt``

I have not encountered this message from lightning before, but I notice I am using pytorch-lightning==1.8 on all my machines. Could you please try again with spt-2_s3dis_fold5.ckpt after downgrading your version:

pip uninstall pytorch-lightning
pip install --upgrade pytorch-lightning==1.8

Answer 8 · 2023-08-02T21:05:30.000Z

Hey @pynsigrid the spt-2_s3dis_fold1.ckpt has been modified, I just successfully loaded it using tested it using the demo_s3dis.ipynb notebook. So this should fix the second error your encountered.

Regarding the initial error, have you had the chance to retry with a downgraded pytorch-lightning==1.8 version ?

Answer 9 · 2023-08-07T04:08:54.000Z

Hey @pynsigrid the spt-2_s3dis_fold1.ckpt has been modified, I just successfully loaded it using tested it using the demo_s3dis.ipynb notebook. So this should fix the second error your encountered.it

Regarding the initial error, have you had the chance to retry with a downgraded pytorch-lightning==1.8 version ?

Hi @drprojects, apologies for my delayed response. Thank you very much for your update. I attempted to run the code using pytorch-lightning==1.8, but unfortunately, it still failed to execute. Hence, I suspect that this might not be the solution to the issue at hand. Currently, I am rerunning your repository on another device with the hopes that it will work. I will keep you posted on the latest results in a few days.

Answer 10 · 2023-08-07T07:43:14.000Z

Sorry to hear that, please keep me updated. If the problem persists, I will retry on my end with a fresh install on another machine, to see if I can reproduce your issue.

Answer 11 · 2023-08-07T21:27:15.000Z

hi @drprojects, unfortunately, I have to inform you that the same error has occurred again.
Similar to my previous attempt, I ran eval.py on S3DIS on another server, and there was no error report. However, when I tried running demo_s3dis.ipynb, the same problem occurred again.
Please refer to the following traceback for more details.

Traceback (most recent call last):
  File "notebooks/demo_s3dis.py", line 39, in <module>
    model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
  File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 136, in load_from_checkpoint
    return _load_from_checkpoint(
  File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 179, in _load_from_checkpoint
    return _load_state(cls, checkpoint, strict=strict, **kwargs)
  File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 237, in _load_state
    keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "/home/yining/codefield/superpoint_transformer/src/models/segmentation.py", line 551, in load_state_dict
    class_weight_bckp = self.criterion.weight
  File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'MultiLoss' object has no attribute 'weight'

Answer 12 · 2023-08-10T14:11:21.000Z

Hi, I am currently out of office, will look into this when I come back, two weeks from now.

Answer 13 · 2023-08-21T13:19:34.000Z

Hi @pynsigrid, apologies for the delay. I could not reproduce your error because I was using a slightly more recent (but not public yet) version of the code. I manage to reproduce the issue when using the released notebooks.

The error came from the fact that I had not updated the demo_*.ipynb after this commit.

This new commit should normally fix the error you encountered in the notebooks.

Please let me know if this solves the issue on your end.

Best,
Damien

Answer 14 · 2023-08-30T09:22:23.000Z

Hi @pynsigrid can I consider this issue solved ?