lavis-nlp/spert

The saved model: pytorch_model.bin is still the original pretrained model?

victorbai2 opened this issue · 11 comments

@markus-eberts Hi Markus,
After training, I checked path data/save/conll04_train/2021-01-07_09:31:34.998954/final_model/pytorch_model.bin and realized that the the model is still original pretrained bert model.

I am not an expert for pytorch, does the below pytorch mean that only the pretrained bert model is saved?

        # save model
        if isinstance(model, DataParallel):
            model.module.save_pretrained(dir_path)
        else:
            model.save_pretrained(dir_path)

Hi,
this should not be the case and I currently do not have any explanation for this. The code snippet you posted is fine. Why do you think it is still the pretrained model? And can you post the library versions and the configuration you used?

@markus-eberts Thanks for your response. I firstly check the model size in the data/save/...directory, and found that the pytorch_model.bin is the same size as the one for pretrained model downloaded(413M), and then I evaluated the both models with evaluation dataset, and the result is the same.

The configuration and other things are the same.

@markus-eberts please find the below example that I tested in google colab.

Spert.ipynb.zip

I just updated the repository (some changes due to upgrade to new 'transformers' version) and requirements.txt. Model saving works fine on my side. Could you please pull the newest changes, use the libraries in requirements.txt and try again?

@markus-eberts Hi I implemented the changes to google colab, but the result is unfortunately the same and the weight is not saved, the epoch I set for testing is 3. I checked the size of pytorch_model.bin that is saved in dir /save/data..../finial_model.

Is that the same in your end? I wonder if the code: model.save_pretrained(dir_path) only saves the pretrained weight as the name reflects.

BTW, I even ran " python ./spert.py eval --config configs/example_eval.conf " with the purely pretrained_weight (pytorch_model.bin), surprisingly, it can be evaluated? Where all other layers or weights used after cls layers?

@markus-eberts I think I understand it now, I used the pytorch_model.bin that has been trained by you and I downloaded in dir: data/model/pytorch_model.bin.

But one thing seems strange for me is why the trained saved model(pytorch_model.bin) is the same size of original pretrained model. After training, should not the model become much larger just as the one for tensorflow?

Is that the same in your end? I wonder if the code: model.save_pretrained(dir_path) only saves the pretrained weight as the name reflects.

The 'save_pretrained' method of 'transformers' definitely saves the whole model. I use the library alot and it's also stated in the documentation.

But one thing seems strange for me is why the trained saved model(pytorch_model.bin) is the same size of original pretrained model

I'm not sure if you are comparing with the CoNLL04 model provided by us or the bert-base-cased model downloaded via 'transformers' library. The CoNLL04 'pytorch_model.bin' trained by us is already finetuned on the task of joint entity and relation extraction. So it should roughly match the size of your trained model and give good evaluation results. Regarding the bert-base-cased model (MLM pre-trained, but not finetuned on the target task), I also do not expect a large size difference to a finetuned model, since we only add shallow (relative to BERT) linear layers.

@markus-eberts I compared the trained model from you and the bert-base-case, the size of those two is the same.

BTW, is all the code written by yourself? It is very high quality code.

I compared the trained model from you and the bert-base-case, the size of those two is the same.

This is reasonable. When you use your trained model for evaluation (e.g. 'python ./spert.py eval --config configs/example_eval.conf' and set model_path/tokenizer_path to your model) it should give you similar results as on the validation dataset (as outputted after training). In this case, everything works as expected and the model was saved correctly.

BTW, is all the code written by yourself? It is very high quality code.

Yes and thank you. I try my best to make the code 'readable' and easy to follow. However, since this is just the code accompanying a research paper, its main purpose is to reproduce our evaluation results. I often wish to have done some code parts better (from a software architectural point of view) but lacked the time to do so. After all, the next paper deadline is usually right around the corner ;). Of course I'm glad that the code and the SpERT model itself is useful for the research community and beyond.

@markus-eberts you are really productive. If you would like to read your next paper once it is published.

@markus-eberts you are really productive. If you would like to read your next paper once it is published.

Thanks.