SangitaNLP/sangita

Implement Sangita in Bengali

djokester opened this issue · 3 comments

Task List for getting Started

  • Implement Tokeniser for Bengali

  • Find Datasets for the Language to move forward with.

Pretty good page of a person already working on the same thing: ISI - Bengali.
Not exactly what we want but kaggle has this bengaliAI group solving Bengali datasets for recognizing handwritten numeric digits in Bengali. kaggle

@lavishsaluja good one
Lets connect at night.

Some other resources:

  1. http://docs.cltk.org/en/latest/bengali.html
  2. https://github.com/banglakit/awesome-bangla (It contains links to multiple datasets)
  3. Handwritten Text in Bengali: https://data.mendeley.com/datasets/hf6sf8zrkc/2 ( May not exactly fit into the given specifications)
  4. https://github.com/AtikRahman/Bangla_ABSA_Datasets Contains 2 excel files whose data has been used for sentiment analysis in some other project(License not added on Github)