Implement Sangita in Bengali
djokester opened this issue · 3 comments
djokester commented
Task List for getting Started
-
Implement Tokeniser for Bengali
-
Find Datasets for the Language to move forward with.
lavishsaluja commented
Pretty good page of a person already working on the same thing: ISI - Bengali.
Not exactly what we want but kaggle has this bengaliAI group solving Bengali datasets for recognizing handwritten numeric digits in Bengali. kaggle
djokester commented
@lavishsaluja good one
Lets connect at night.
ighosh98 commented
Some other resources:
- http://docs.cltk.org/en/latest/bengali.html
- https://github.com/banglakit/awesome-bangla (It contains links to multiple datasets)
- Handwritten Text in Bengali: https://data.mendeley.com/datasets/hf6sf8zrkc/2 ( May not exactly fit into the given specifications)
- https://github.com/AtikRahman/Bangla_ABSA_Datasets Contains 2 excel files whose data has been used for sentiment analysis in some other project(License not added on Github)