yaoxingcheng/TLM

如何下载CC-Stories语料

sunyilgdx opened this issue · 1 comments

请问如何下载到CC-Stories语料库呢?

We haven't found a publicly available version of STORIES yet. In our work, we follow the same methodology in the original paper of STORIES dataset (Section 5.3) to collect a set of documents with a similar size from the CommonCrawl corpus, and use that collected documents as part of our general corpus.