vipranarayan14/aksharas

add support for word count

vipranarayan14 opened this issue · 1 comments

add support for word count

The word count in Sanskrit is not the same as the way we do in English.

For English, a word counter (in its simplest implementation) will consider "words" as the sequences of word characters (like 'a', 'b', '1', '2', etc.) separated by non-word characters (like '\s', '-', '.', ']', etc.).

But in Sanskrit, we have sandhis. For example, in the text -- "ग्राममायाति। ग्रामं गच्छति। ग्रामङ्गच्छति।", we have 6 words -- ग्रामम्, आयाति, ग्रामम्, गच्छति, ग्रामम्, गच्छति. Though "ग्राममायाति" appears as one word, it should be considered as having two words. Similarly, "ग्रामङ्गच्छति" should also be considered as having two words. Only a word counter that works in this manner can be a true word counter for Sanskrit. But this isn't easy to implement without a sandhi-splitter.

Moreover, if users want word count just as it is seen in English word counters (without sandhi-splitting) then they can use some of those tools available at here, here and here. I don't want to duplicate those functionalities in this app.

Also, I don't want to add any word-related functions to this app. As a tool for aksharas and varnas, I want to keep it simple and straightforward and it should do one thing well.