The count of docs and sents in PubMedDS
GanjinZero opened this issue · 1 comments
GanjinZero commented
I am using PubmedDS as training corpora for my project.
I notice the count of documents and sentences is inconsistent in arxiv v1 and v2/v3.
Do you add new documents to PubmedDS?
svjan5 commented
Yes, we made few improvements in the dataset generation code of PubmedDS. Please use the latest dataset.