monologg/KoBigBird

Specific information about this model.

midannii opened this issue 3 years ago · 2 comments

midannii commented 3 years ago

Checklist

[ x ] I've searched the project's issues

❓ Question

You mentioned "모두의 말뭉치, 한국어 위키, Common Crawl, 뉴스 데이터 등 다양한 데이터로 학습" and I want to know the size of total corpus for pre-training.
Also I want to know the vocab size of this model.

📎 Additional context

monologg commented 3 years ago

We used approximately 70GB for pretraining.
Vocab size is 32500. Check config.json for more details.

midannii commented 3 years ago

Thank you :)