/unicode-slugify-latin

A slugifier that works in unicode, and enables replacement for common latin letters into ascii representations.

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Build Status

Note

Please use unicode-slugify instead of this library. This was created before I merged the ascii representation feature into the source repo.

Links

PyPi: https://pypi.python.org/pypi/unicode-slugify-latin

Github: https://github.com/eminbugrasaral/unicode-slugify-latin

Unicode Slugify (with Latin Hack)

Unicode Slugify is a slugifier that generates unicode slugs. It was originally used in the Firefox Add-ons web site to generate slugs for add-ons and add-on collections. Many of these add-ons and collections had unicode characters and required more than simple transliteration.

Install

pip install unicode-slugify-latin

Usage

>>> import slugify

>>> slugify.slugify(u'Bän...g (bang)')
u'bäng-bang'

Latin Hack

  • Replaces special Latin chars with similar ascii representations.
  • Problem: I want users who speak Latin languages with English keyboards to be able to search through my Latin strings.
  • Solution: Slugify that Latin string by enabling Latin replacement, and match this string with the slugified search word.
  • Example: Strore "Sabancı Üniversitesi" as "sabanci-universitesi" and then users will be able to search with any combination like "Sabanci", "Sabancı" and "SABANCI".
  • Note: Do not forget to slugify both strings with replace_latin=True

Example

>>> from slugify import slugify

>>> string_without_latin_letters = slugify(u'ıspanaklı boğaz turşusu', replace_latin=True)
u'ispanakli-bogaz-tursusu'

>>> slugify(u'Ispanakli Bogaz Tursusu') == string_without_latin_letters
True

>>> u'Bogazici'.lower() in slugify(u'boğaziçi', replace_latin=True)
True

>>> slugify(u'çiçek', replace_turkish=True) in slugify(u'ÇİÇEK', replace_latin=True)
True

>>> u'cicek' in slugify(u'ÇİÇEK', replace_latin=True)
True

List of common latin letters to be replaced

  • ı, ì, í, î, ï -> i
  • İ, Ì, Í, Î, Ï -> I
  • ö, ó, ò, ô, õ, ø -> o
  • Ö, Ò, Ó, Ô, Õ, Ø -> O
  • ü, ù, ú, û -> u
  • Ü, Ù, Ú, Û -> U
  • à, á, â, ã, ä, å -> a
  • À, Á, Â, Ã, Ä, Å -> A
  • æ -> ae
  • Æ -> AE
  • è, é, ê, ë -> e
  • È, É, Ê, Ë -> E
  • ñ -> n
  • Ñ -> N
  • ý, ÿ -> y
  • Ý, Ÿ -> Y
  • ş -> s
  • Ş -> S
  • ç -> c
  • Ç -> C
  • ğ -> g
  • Ğ -> G

New parameters after this fork

  • replace_latin: Replace common Latin letters to be replaced with similar ascii representation.
  • unicode_pairs: You can give a dictionary of unicode characters with their replacement values. Like: {u'\xe9', 'e'} - é will be replaced with e

Sponsors

Contact