"JointBERT" vs. "BertForMaskedLM" in model config json file

Question

"JointBERT" vs. "BertForMaskedLM" in model config json file

Closed this issue 4 years ago · 7 comments

Hi,

This is a great repo! I was using your code to train Bert on my own data, but I noticed that the architecture field in the model config json file changed. The code and data are exactly the same, but the architecture changed which is wired.

Please see the attached screenshots.
Do you know why this changed over time?
Thanks!

Answer 1 · 2020-02-18T01:44:52.000Z

@monologg Same problem for me

Answer 2 · 2020-02-18T03:14:17.000Z

Hi!

@haomiao-li Can you tell me the version of transformers library that you used? When I'm tried to reproduce this one, my config.json shows JointBert on architectures attributes.

[Reproducing steps]

transformers==2.4.1
Create the dataset with the task name of wow
Add task name on processors dict (in data_loader.py)

Then run the code as follow.

$ python main.py --task wow --model_type bert --model_dir wow_model --do_train

Answer 3 · 2020-02-18T03:28:44.000Z

It seems like that the older version of transformers library has the bug. When I tried with the version under 2.4, the architectures is saved as BertForMaskedLM as default. But with new version (2.4.1), the architectures is correctly saved as JointBert.

@haomiao-li @pttzty Can you try with transformers version of 2.4.1?

Answer 4 · 2020-02-18T03:42:08.000Z

@monologg Will try it asap. The error looks weird because the performance for the intent detection part did drop significantly with Bert Masked LM

Answer 5 · 2020-02-18T04:21:34.000Z

@monologg Will try it asap. The error looks weird because the performance for the intent detection part did drop significantly with Bert Masked LM

I've just checked this one. But even with ver 2.2.2 (config is also saved as BertForMaskedLM), the performance looks same. Below is the steps that I've tried. If my step is different with yours, just let me know the steps you tried:)

[Reproducing steps]
0. transformers==2.2.2

In main.py, change the code to evaluate the dev set when --do_eval
(I've change the code to load all three dataset at the beginning)
Run the code

$ python main.py --task atis --model_dir atis_model --do_train --do_eval --num_train_epochs 1

Check the difference between the acc shown during training and the acc shown during evaluation

Answer 6 · 2020-02-18T04:45:09.000Z

@monologg Problem fixed after we upgraded to the latest version of the transformers. Thanks!
Just another side note that I think you were trying to use ignore_index to exclude pads from the loss function and the final predicted output, and it actually worked well for the two sample datasets. However in my case I was getting lots of "PAD" for the final prediction results where it should actually be "O" category, and I think the pad was actually not excluded.

Answer 7 · 2020-02-18T04:56:09.000Z

@monologg Problem fixed after we upgraded to the latest version of the transformers. Thanks!
Just another side note that I think you were trying to use ignore_index to exclude pads from the loss function and the final predicted output, and it actually worked well for the two sample datasets. However in my case I was getting lots of "PAD" for the final prediction results where it should actually be "O" category, and I think the pad was actually not excluded.

Great! Thank you for reporting the bug:) I'll close this issue.