monologg/KoBigBird

Specific information about this model.

midannii opened this issue · 2 comments

Checklist

  • [ x ] I've searched the project's issues

❓ Question

  • You mentioned "모두의 말뭉치, 한국어 위키, Common Crawl, 뉴스 데이터 등 다양한 데이터로 학습" and I want to know the size of total corpus for pre-training.

  • Also I want to know the vocab size of this model.

📎 Additional context

  • We used approximately 70GB for pretraining.
  • Vocab size is 32500. Check config.json for more details.

Thank you :)