pncnmnp/LuaNLP

Other languages sources

Closed this issue ยท 7 comments

Hi! Great project.
Where I find resources to add other languages? I would like to create a portuguese language version of this toolkit.

Hi there! obrigado pelo elogio!
Can you tell me what NLP tasks would you like to have a Portuguese version of? That can be a good starting point for me.

Hi! Sorry for the delay.
So, performing "sentiment analysis" might be a bit difficult as this library uses VADER lexicons from VADER Sentiment. Unfortunately, these lexicons were only made for the English language.

Stemming is probably possible to perform. I found this resource from NLTK - Examples for Portuguese Processing. As you can observe, they have an RSLP stemmer which works like this -

>>> stemmer = nltk.stem.RSLPStemmer()
>>> stemmer.stem("copiar")
'copi'
>>> stemmer.stem("paisagem")
'pais'

The source code for the same is available here - Source code for nltk.stem.rslp. If you do not have a language constraint, I recommend using NLTK. However, if you do require Lua, your best bet would be to port this code to Lua.

The RSLP stemmer's data can be found on NLTK Corpora page - see 4. RSLP Stemmer (Removedor de Sufixos da Lingua Portuguesa) [ download | source ].

I would love to include your code into this repository (if you can make it publicly accessible). Also, if you have any questions, please feel free to ask them in this issue thread. I am not closing this issue for now.

So, I found this notebook on Kaggle which performs Sentiment Analysis in Portuguese. I believe the author has used Multinomial Naive Bayes (MNB) and has obtained around 0.9 F1 score on this task. I would suggest you to try porting the code to Lua. There is some code which might help you with the MNB implementation/integration. Do check out -

  1. https://github.com/pncnmnp/LuaNLP/blob/main/tokenizer/supervised.lua
  2. https://github.com/pncnmnp/LuaNLP/blob/main/tokenizer/nb.lua

I believe the latter is more efficient, but I am not 100% sure.

Hi!

Sorry for the delay
Thank you for finding these contents, I will review them and plan to port in the future. Analyzing looks easy. Thanks for the effort. ๐Ÿ˜„

Hi @impul-so, any update? Can I close the issue?

Sorry for the delay. I will close this issue.