Unidefy -- substitute unicode for ascii equivalents, if available

I wrote this primarily to normalize certain data for searching, the problem is that certain characters, like the umlat, are hard to do on a normal keyboard, and so most people don't bother, so this module can be used on the indexed strings to allow either a u or an umlat to be used (since the u wouldn't be changed and the umlat would be changed to the u) so you can search both ways but only have to store one version.

There are other modules that help with this, but python's builtin unicodedata didn't quite do what I needed since it only uses defined unicode normalizations, and something like unidecode works but it's a little too eager, getting rid of unicode chars it doesn't recognize. I wanted to keep unicode that there wasn't a good substitution for, likewise, I didn't really want to turn chinese characters into english either (something unidecode does), it's definitely worth a look if you want more aggressive substitution.

To install, use Pip:

pip install git+https://github.com/Jaymon/unidefy#egg=unidefy

I got the data for the substitution table from these locations

http://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/docs/designDoc/UserContribution/asciiConversion.html

http://unicode.org/repos/cldr/trunk/common/transforms/Latin-ASCII.xml

http://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/docs/designDoc/UDF/unicode/MapTables/CoreNormResults.html

Jaymon/unidefy

Unidefy -- substitute unicode for ascii equivalents, if available

More reading, if you're interested

I got the data for the substitution table from these locations