[Notice] This repository was deprecated, please use wikt2pron instead.
A Python toolkit which looks up given words in Wiktionary and returns structured Python dict format. Support the following list at present,
- languages
- parts of speech
- pronunciations (IPA, CMUBET, enPR, audio link)
Written in pure Python, compatible with Python 2.6+ and 3.2+, no dependencies.
# download the latest version
$ git clone https://github.com/abuccts/wiktionary-lookup.git
$ cd wiktionary-lookup
# install and run test
$ python setup.py install
$ python setup.py -q test
First, create an instance of Wiktionary
class:
>>> from pywiktionary import Wiktionary
>>> wikidict = Wiktionary(lang="English", CMUBET=True, phoneme_only=False)
Lookup a word using lookup
method:
>>> word = wikidict.lookup("read")
The entry of word "read" is at https://en.wiktionary.org/wiki/read#English, and here is the lookup result:
>>> from pprint import pprint
>>> pprint(word)
{'English': {'Part of Speech': ['Verb', 'Noun'],
'Pronunciation': [{'CMUBET': ['R IY D .'],
'IPA': (['/ɹiːd/'], 'en'),
'enPR': 'rēd'},
{'Audio': ('En-uk-to read.ogg',
'Audio (UK)',
'en')},
{'Audio': ('en-us-read.ogg',
'Audio (US)',
'en')},
{'CMUBET': ['R EH D .'],
'IPA': (['/ɹɛd/'], 'en'),
'enPR': 'rĕd'},
{'Audio': ('en-us-read-past.ogg',
'Audio (US)',
'en')}]}}
To lookup a word in a different language, specify the lang
parameter (CMUBET
parameter is only available for lang="English"
at present):
>>> word = wikidict.lookup("читать", lang="Russian")
>>> pprint(word)
{'Russian': {'Part of Speech': ['Verb'],
'Pronunciation': [{'IPA': (['[t͡ɕɪˈtatʲ]'], 'ru')},
{'Audio': ('Ru-читать.ogg', 'Audio', 'ru')}]}}
Please note that the default language of wikidict
is "English"
which is set when the instance is created. To change the language of wikidict
permanently, create another instance of Wiktionary
class or use set_lang
function:
>>> wikidict.set_lang("French")
>>> word = wikidict.lookup("être")
>>> pprint(word)
{'French': {'Part of Speech': ['Verb', 'Noun'],
'Pronunciation': [{'IPA': (['/ɛtʁ/'], 'fr')},
{'Audio': ('Fr-être-fr-ouest.ogg',
'Audio (France, West)',
'fr')},
{'Accent': 'Quebec', 'IPA': (['[aɛ̯tʁ]'], 'fr')},
{'Audio': ('Qc-être.ogg',
'Audio (Quebec, Montreal)',
'fr')},
{'Accent': 'Louisiana',
'IPA': (['[ɛt(ɾ)]'], 'fr')}]}}
For phoneme only output without other information, set phoneme_only
parameter to True
:
>>> word_phoneme = wikidict.lookup("être", phoneme_only=True)
>>> pprint(word_phoneme)
{'IPA': ['/ɛtʁ/', '[aɛ̯tʁ]', '[ɛt(ɾ)]']}
More exmaples of different languages can be found at Example Index Wiki Page.
For command line interface, please refer to Command Line Usage Wiki Page.