How did you pre-train the NCBI abstract data exactly ?

Question

How did you pre-train the NCBI abstract data exactly ?

zhouyunyun11 opened this issue 5 years ago · 5 comments

In your manuscript, your described like this:
"We initialized BERT with pre-trained BERT provided by (Devlin et al., 2019). We then continue to pre-train the model, using the listed corpora".

Did you use BERT code completely re-train the NCBI abstract corpora? Or used BERT initial model and wordpiece strategy as bioBERT method?

Answer 1 · 2019-10-02T13:20:47.000Z

We used BERT initial model and workpiece strategy.

Answer 2 · 2019-10-02T20:33:33.000Z

Do you mean you used the same strategy as Bio_BERT?

…

On Wed, Oct 2, 2019 at 9:20 AM Yifan Peng ***@***.***> wrote: We used BERT initial model and workpiece strategy. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABMXK4XPGPTZMKUSHTE4E73QMSN3BANCNFSM4I4WGP3Q> .

Answer 3 · 2019-10-02T20:34:32.000Z

Did you create your own vocab.txt file or use Google default one?

…

On Wed, Oct 2, 2019 at 4:33 PM Yunyun Zhou ***@***.***> wrote: Do you mean you used the same strategy as Bio_BERT? On Wed, Oct 2, 2019 at 9:20 AM Yifan Peng ***@***.***> wrote: > We used BERT initial model and workpiece strategy. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#5>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABMXK4XPGPTZMKUSHTE4E73QMSN3BANCNFSM4I4WGP3Q> > . >

Answer 4 · 2019-10-02T21:17:09.000Z

We used the Google default vocab.txt

Answer 5 · 2019-10-02T21:17:44.000Z

I am not sure what you meant by "same strategy as Bio_BERT"