SoundsLike provides various functions that generate lists of similar-sounding words for a given search term. This general purpose tool can be useful for matching similar strings whose content is made up of the English language.
SoundsLike is for me. I'm interested in using it to deal with with messy names, misspelled words, and bad transcriptions. I think it can be especially useful for resolving mismatches at the interface of typed text and spoken language. Some example applications include:
- Telephone Customer Service
- Immigration Research
- Database Entity Resolution
That said, it's mostly just a project to help guide my own learning journey. If it's useful for you too, that's even better!
- Finding alternate spellings of words.
- Handling mispronunciations and/or transcription errors in search functions.
- A songwriting or poem-writing aid.
pip install SoundsLike
- SoundsLike.py
- DictionaryTools.py
- FuzzyTerm.py
- Example.py
Example 1
from SoundsLike.SoundsLike import Search
Search.perfectHomophones('Jonathan')
['Johnathan', 'Johnathon', 'Jonathan', 'Jonathon', 'Jonothan']
Example 1
Search.perfectHomophones('Lucy')
['Lucey', 'Lucie', 'Lucy', 'Luisi']
Search.closeHomophones('Lucy')
['Lucey', 'Lucie', 'Lucy', 'Luisi']
Example 2
Search.perfectHomophones('Lou C')
[]
Search.closeHomophones('Lou C')
['Lucey', 'Lucie', 'Lucy', 'Luisi']
Other homophone and rhyming patterns are available in SoundsLike.py. Explore them using the help()
function in your interactive interpreter.
Examples include:
- Vowel-class Homophones: Vowel phones are reduced to their ARPAbet classification.
- Phone-class Homophones: All phones are reduced to their ARPAbet classification.
- End-rhymes: Traditional rhyming. Takes optional arguments to find end-rhymes with same syllabic length and/or same first initial.
Coming eventually!
For detailed instructions, try running help(SoundsLike)
in your interactive python interpreter.
You can also run help()
on any of the individual modules contained in SoundsLike, though you may need to import them individually to do so. Keep in mind that the package is called SoundsLike, and the primary module is also called SoundsLike, so just make sure you specify the correct one.
SoundsLike uses the CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
It also offers some tools for working with dictionaries, if you prefer to use your own.
Phoneme generation, when enabled, is provided by g2p-en: https://github.com/Kyubyong/g2p
Similar string matching is provided by difflib: https://docs.python.org/3/library/difflib.html
- The CMU Pronouncing Dictionary
- cmudict python wrapper by David L. Day
- g2p-en python module by Kyubyong Park
- cmudict
- g2p-en
- json
- re
- While this module supports multi-token search terms, it always reduces them to one group of phones. This could lead to some unexpected, but still useful, results. Resultantly, multi-token results are not supported at this time.
- Support is not presently offered for multiple pronunciations of a given token.
- English Language CMU Dict can be swapped out for any other pronunciation dict by uncommenting and setting the DictionaryFilepath to point at a JSON file. This could be useful if one wished to build and use a custom dictionary.
- Provide option to import CMUdict (or any other dict) from a JSON, so that functions can reference it directly (rather than it being imported anew each time a function is called).
- Create match pattern for same first and last syllable, and same number of syllables.
- Add multi-token results. Check each token in multi-token search terms, and concatenate all possible results if all tokens are found. e.g.: "Lee Ann" could return "Leigh Anne," "Lea An," "Lianne," etc.
- Develop module to figure out "smart selection" results for display. -Dramatically speed up subsequent searches by front-loading rhyme-pattern generation and hashing the results.
Licensed under the Apache License, Version 2.0
Enjoy!