Specific information about this model.
midannii opened this issue · 2 comments
midannii commented
Checklist
- [ x ] I've searched the project's
issues
❓ Question
-
You mentioned "모두의 말뭉치, 한국어 위키, Common Crawl, 뉴스 데이터 등 다양한 데이터로 학습" and I want to know the size of total corpus for pre-training.
-
Also I want to know the vocab size of this model.
📎 Additional context
monologg commented
- We used approximately 70GB for pretraining.
- Vocab size is 32500. Check config.json for more details.
midannii commented
Thank you :)