[Checkpoint Loading] Issues on loading s3dis ckpt weights when running 'demo_s3dis.ipynb'
Closed this issue · 14 comments
Hi, thanks again for your great project.
I have encountered a problem while attempting load pre-trained checkpoint file spt-2_s3dis_fold5.ckpt
when I run demo_s3dis.ipynb
.
The detailed error report is as follows:
AttributeError Traceback (most recent call last)
Cell In[11], line 10
7 model = hydra.utils.instantiate(cfg.model)
9 # Load pretrained weights from a checkpoint file
---> 10 model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
11 model.criterion = hydra.utils.instantiate(cfg.model).criterion
12 model = model.eval().cuda()
File /home/pai/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/module.py:1520, in LightningModule.load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
1440 @classmethod
1441 def load_from_checkpoint(
1442 cls,
(...)
1447 **kwargs: Any,
1448 ) -> Self:
1449 r"""
1450 Primary way of loading a model from a checkpoint. When Lightning saves a checkpoint
1451 it stores the arguments passed to ``__init__`` in the checkpoint under ``"hyper_parameters"``.
(...)
1518 y_hat = pretrained_model(x)
1519 """
-> 1520 loaded = _load_from_checkpoint(
1521 cls,
1522 checkpoint_path,
1523 map_location,
1524 hparams_file,
1525 strict,
1526 **kwargs,
1527 )
1528 return cast(Self, loaded)
File /home/pai/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:90, in _load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
88 return _load_state(cls, checkpoint, **kwargs)
89 if issubclass(cls, pl.LightningModule):
---> 90 model = _load_state(cls, checkpoint, strict=strict, **kwargs)
91 state_dict = checkpoint["state_dict"]
92 if not state_dict:
File /home/pai/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:156, in _load_state(cls, checkpoint, strict, **cls_kwargs_new)
154 # load the state_dict on the model automatically
155 assert strict is not None
--> 156 keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
158 if not strict:
159 if keys.missing_keys:
File /mnt/user/superpoint_transformer/src/models/segmentation.py:551, in PointSegmentationModule.load_state_dict(self, state_dict, strict)
549 # Special treatment for MultiLoss
550 if self.multi_stage_loss:
--> 551 class_weight_bckp = self.criterion.weight
552 self.criterion.weight = None
554 # Recover the class weights from any 'criterion.weight' or
555 # 'criterion.*.weight' key and remove those keys from the
556 # state_dict
File /home/pai/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py:1614, in Module.__getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
AttributeError: 'MultiLoss' object has no attribute 'weight'
Another Information: I have tried to run the eval process using this checkpoint successfully, so it might not be the problem in ckpt or dataset.
hi,Do you know how to obtain the processed dataset
hi,Do you know how to obtain the processed dataset
Hi, as my experience, if you have download the full dataset, like below, and you run eval.py as indicated in readme.md, then the script will automatically conduct the processed dataset first.
└── data
└── s3dis # Structure for S3DIS
├── Stanford3dDataset_v1.2.zip # (optional) Downloaded zipped dataset with non-aligned rooms
├── raw # Raw dataset files
│ └── Area_{{1, 2, 3, 4, 5, 6}} # S3DIS's area/room/room.txt structure
│ └── Area_{{1, 2, 3, 4, 5, 6}}_alignmentAngle.txt # Room alignment angles required for entire floor reconstruction
│ └── {{room_name}}
│ └── {{room_name}}.txt
Hi @pynsigrid, thanks for using this project !
The error you are encountering seems related to #12. I had solved it but then recently made another change that might cause this. I will look into this and let you know soon.
Hi @pynsigrid things to work fine on my end. Are you using the latest version of the project ?
In particular, I made some changes in this commit which could solve your problem.
Hi @pynsigrid things to work fine on my end. Are you using the latest version of the project ?
In particular, I made some changes in this commit which could solve your problem.
Hi @drprojects, YES, I just re-clone this repo and install environment again. But the same error occurs again.
Some Information of this experiment:
notebook: demo_s3dis.ipynb
ckpt: spt-2_s3dis_fold{2 to 6}.ckpt
error massage:
Lightning automatically upgraded your loaded checkpoint from v1.8.0 to v2.0.6. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../superpoint_transformer_0728/checkpoints/spt-2_s3dis_fold6.ckpt`
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 10
7 model = hydra.utils.instantiate(cfg.model)
9 # Load pretrained weights from a checkpoint file
---> 10 model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
11 # model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
12 model.criterion = hydra.utils.instantiate(cfg.model).criterion
File /home/pai/envs/spt2/lib/python3.8/site-packages/pytorch_lightning/core/module.py:1520, in LightningModule.load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
1440 @classmethod
1441 def load_from_checkpoint(
1442 cls,
(...)
1447 **kwargs: Any,
1448 ) -> Self:
1449 r"""
1450 Primary way of loading a model from a checkpoint. When Lightning saves a checkpoint
1451 it stores the arguments passed to ``__init__`` in the checkpoint under ``"hyper_parameters"``.
(...)
1518 y_hat = pretrained_model(x)
1519 """
-> 1520 loaded = _load_from_checkpoint(
1521 cls,
1522 checkpoint_path,
1523 map_location,
1524 hparams_file,
1525 strict,
1526 **kwargs,
1527 )
1528 return cast(Self, loaded)
File /home/pai/envs/spt2/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:90, in _load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
88 return _load_state(cls, checkpoint, **kwargs)
89 if issubclass(cls, pl.LightningModule):
---> 90 model = _load_state(cls, checkpoint, strict=strict, **kwargs)
91 state_dict = checkpoint["state_dict"]
92 if not state_dict:
File /home/pai/envs/spt2/lib/python3.8/site-packages/pytorch_lightning/core/saving.py:156, in _load_state(cls, checkpoint, strict, **cls_kwargs_new)
154 # load the state_dict on the model automatically
155 assert strict is not None
--> 156 keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
158 if not strict:
159 if keys.missing_keys:
File /mnt/user/superpoint_transformer/src/models/segmentation.py:551, in PointSegmentationModule.load_state_dict(self, state_dict, strict)
549 # Special treatment for MultiLoss
550 if self.multi_stage_loss:
--> 551 class_weight_bckp = self.criterion.weight
552 self.criterion.weight = None
554 # Recover the class weights from any 'criterion.weight' or
555 # 'criterion.*.weight' key and remove those keys from the
556 # state_dict
File /home/pai/envs/spt2/lib/python3.8/site-packages/torch/nn/modules/module.py:1614, in Module.__getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
AttributeError: 'MultiLoss' object has no attribute 'weight'
while changing ckpt to fold1, the error massage changes:
ckpt: spt-2_s3dis_fold1.ckpt
error massage:
RuntimeError: [enforce fail at inline_container.cc:257] . file in archive is not in a subdirectory archive/: spt-2_dales.ckpt
Hi @pynsigrid
while changing ckpt to fold1, the error massage changes:
ckpt: spt-2_s3dis_fold1.ckpt
error massage:
RuntimeError: [enforce fail at inline_container.cc:257] . file in archive is not in a subdirectory archive/: spt-2_dales.ckpt
Good catch ! It seems this file was corrupt and it is my mistake, I have to apologize... I contacted the server administrators to update the zeonodo record. I will let you know when it is fixed.
Other than the spt-2_s3dis_fold1.ckpt
file which indeed has a problem, I could successfully run demo_s3dis.ipynb
, demo_kitti360.ipynb
, and demo_dales.ipynb
with all the other checkpoints provided in the zeonodo record.
So, if you are certain you are using the latest version of the code, maybe this has something to do with the warning message you have:
Lightning automatically upgraded your loaded checkpoint from v1.8.0 to v2.0.6. To apply the upgrade to your files permanently, run
python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../superpoint_transformer_0728/checkpoints/spt-2_s3dis_fold6.ckpt``
I have not encountered this message from lightning before, but I notice I am using pytorch-lightning==1.8
on all my machines. Could you please try again with spt-2_s3dis_fold5.ckpt
after downgrading your version:
pip uninstall pytorch-lightning
pip install --upgrade pytorch-lightning==1.8
Hey @pynsigrid the spt-2_s3dis_fold1.ckpt
has been modified, I just successfully loaded it using tested it using the demo_s3dis.ipynb
notebook. So this should fix the second error your encountered.
Regarding the initial error, have you had the chance to retry with a downgraded pytorch-lightning==1.8
version ?
Hey @pynsigrid the
spt-2_s3dis_fold1.ckpt
has been modified, I just successfully loaded it using tested it using thedemo_s3dis.ipynb
notebook. So this should fix the second error your encountered.itRegarding the initial error, have you had the chance to retry with a downgraded
pytorch-lightning==1.8
version ?
Hi @drprojects, apologies for my delayed response. Thank you very much for your update. I attempted to run the code using pytorch-lightning==1.8
, but unfortunately, it still failed to execute. Hence, I suspect that this might not be the solution to the issue at hand. Currently, I am rerunning your repository on another device with the hopes that it will work. I will keep you posted on the latest results in a few days.
Sorry to hear that, please keep me updated. If the problem persists, I will retry on my end with a fresh install on another machine, to see if I can reproduce your issue.
hi @drprojects, unfortunately, I have to inform you that the same error has occurred again.
Similar to my previous attempt, I ran eval.py
on S3DIS on another server, and there was no error report. However, when I tried running demo_s3dis.ipynb
, the same problem occurred again.
Please refer to the following traceback for more details.
Traceback (most recent call last):
File "notebooks/demo_s3dis.py", line 39, in <module>
model = model.load_from_checkpoint(cfg.ckpt_path, net=model.net, criterion=None)
File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 136, in load_from_checkpoint
return _load_from_checkpoint(
File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 179, in _load_from_checkpoint
return _load_state(cls, checkpoint, strict=strict, **kwargs)
File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 237, in _load_state
keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
File "/home/yining/codefield/superpoint_transformer/src/models/segmentation.py", line 551, in load_state_dict
class_weight_bckp = self.criterion.weight
File "/home/yining/anaconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'MultiLoss' object has no attribute 'weight'
Hi, I am currently out of office, will look into this when I come back, two weeks from now.
Hi @pynsigrid, apologies for the delay. I could not reproduce your error because I was using a slightly more recent (but not public yet) version of the code. I manage to reproduce the issue when using the released notebooks.
The error came from the fact that I had not updated the demo_*.ipynb
after this commit.
This new commit should normally fix the error you encountered in the notebooks.
Please let me know if this solves the issue on your end.
Best,
Damien
Hi @pynsigrid can I consider this issue solved ?