rdamodharan/tamil-stemmer

stemmer for words in sentences

sanjanasri opened this issue · 3 comments

hi,

  first, i would like to appreciate you for your wonderful work and good effort. I tried executing it for sentences. 

i/p: தங்களது குழந்தைகளுக்கு எந்த வகையான கல்வியை அளிக்க வேண்டும் என்பது பற்றிய உரிமை பெற்றோர்களுக்கு இருக்கிறது.
o/p: தங்களது குழந்தைகளுக்கு எந்த வகையான கல்வியை அளிக்க வேண்டும் என்பது பற்றிய உரிமை பெற்றோர்களுக்கு இர்

how do i get it for each word in a sentences. how to give file which contain sentences and get the corresponding stemmed words. Any suggestion would be appreciated.

The stemmer program expects one word per line. There are couple of options here:
$ cat | tr ' ' '\n' | ./stemwords -l ta (have not tried it out, but most probably works.)

Or you could use the libstemmer library to write your own program. There is a PyStemmer python library but it seems to be stale and not have Tamil support. There is also python-nltk library which has snowball support, but it also doesnt have Tamil support.

from nltk.stem.snowball import SnowballStemmer

print SnowballStemmer.languages
# outputs ('danish', 'dutch', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish')

So I think you have to use either the C library or do something like stemmer-ui.py, where it opens a pipe to stemmer program.

Thanks for trying it out and be warned that it is a very naive stemmer :)

@sanjanasri - see open-tamil and examples for tamil sentence/word/char working tools
$ pip install tamil