anubhav-chattoraj/indic-tools

Ignore numbers and non-Devanagari characters function

Closed this issue · 3 comments

Here is a part of my reverse dictionary of Sanskrit after my reverse php script sorted it.


I अजिम्ःअ
H अचिन्तिअ
P इन्दिअ
N उत्सङ्ग्३अ

N पऋआ
N घूर्णिआ
फुम्फुआ

M आइ
रेउइ
M अच्छाइ
M दादाभाइ
M नाह्नाभाइ
राजसाइ

G हूङ्गराई
I सुभ्रुई
R विकृई
I धृतराष्टृई
G धोई

If I do not click "Ignore numbers and non-Devanagari characters" I get
I अजिम्ःअ
M अच्छाइ
H अचिन्तिअ
रेउइ
राजसाइ
फुम्फुआ

If I click the checkbox, I get:




G हूङ्गराई
I सुभ्रुई
R विकृई

Does the checkbox works as intended? If so I hardly understand what and where does it ignore, thanks.

It's the space after the Roman letter which is giving you trouble. The option doesn't ignore spaces; type a space manually in the "ignore these characters" textbox, and they should sort fine. (I do recognize that the user interface could be more helpful in this regard.)

Getting closer. When I try to sort (that has been already sorted with Dhaval's reverser) http://pastebin.com/eN25agx4 in ascending retrograde I get "G bhaṛ" above "ūṃṃ". Should not it go a bit closer with "M upaṛ" together? I get http://pastebin.com/LKEWE8eu - seems fishy.

Can't reproduce. This is the output I get for your input file: