/wiktionary-lookup

lookup words and pronunciations in Wiktionary

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

wiktionary lookup

Build Status Coverage Status BSD licensed

A Python toolkit which looks up given words in Wiktionary and returns structured Python dict format. Support the following list at present,

  • languages
  • parts of speech
  • pronunciations (IPA, CMUBET, enPR, audio link)

Requirements

Written in pure Python, compatible with Python 2.6+ and 3.2+, no dependencies.

Installation

# download the latest version
$ git clone https://github.com/abuccts/wiktionary-lookup.git
$ cd wiktionary-lookup

# install and run test
$ python setup.py install
$ python setup.py -q test

Usage

First, create an instance of Wiktionary class:

>>> from pywiktionary import Wiktionary
>>> wikidict = Wiktionary(lang="English", CMUBET=True, phoneme_only=False)

Lookup a word using lookup method:

>>> word = wikidict.lookup("read")

The entry of word "read" is at https://en.wiktionary.org/wiki/read#English, and here is the lookup result:

>>> from pprint import pprint
>>> pprint(word)
{'English': {'Part of Speech': ['Verb', 'Noun'],
             'Pronunciation': [{'CMUBET': ['R IY D .'],
                                'IPA': (['/ɹiːd/'], 'en'),
                                'enPR': 'rēd'},
                               {'Audio': ('En-uk-to read.ogg',
                                          'Audio (UK)',
                                          'en')},
                               {'Audio': ('en-us-read.ogg',
                                          'Audio (US)',
                                          'en')},
                               {'CMUBET': ['R EH D .'],
                                'IPA': (['/ɹɛd/'], 'en'),
                                'enPR': 'rĕd'},
                               {'Audio': ('en-us-read-past.ogg',
                                          'Audio (US)',
                                          'en')}]}}

To lookup a word in a different language, specify the lang parameter (CMUBET parameter is only available for lang="English" at present):

>>> word = wikidict.lookup("читать", lang="Russian")
>>> pprint(word)
{'Russian': {'Part of Speech': ['Verb'],
             'Pronunciation': [{'IPA': (['[t͡ɕɪˈtatʲ]'], 'ru')},
                               {'Audio': ('Ru-читать.ogg', 'Audio', 'ru')}]}}

Please note that the default language of wikidict is "English" which is set when the instance is created. To change the language of wikidict permanently, create another instance of Wiktionary class or use set_lang function:

>>> wikidict.set_lang("French")
>>> word = wikidict.lookup("être")
>>> pprint(word)
{'French': {'Part of Speech': ['Verb', 'Noun'],
            'Pronunciation': [{'IPA': (['/ɛtʁ/'], 'fr')},
                              {'Audio': ('Fr-être-fr-ouest.ogg',
                                         'Audio (France, West)',
                                         'fr')},
                              {'Accent': 'Quebec', 'IPA': (['[aɛ̯tʁ]'], 'fr')},
                              {'Audio': ('Qc-être.ogg',
                                         'Audio (Quebec, Montreal)',
                                         'fr')},
                              {'Accent': 'Louisiana',
                               'IPA': (['[ɛt(ɾ)]'], 'fr')}]}}

For phoneme only output without other information, set phoneme_only parameter to True:

>>> word_phoneme = wikidict.lookup("être", phoneme_only=True)
>>> pprint(word_phoneme)
{'IPA': ['/ɛtʁ/', '[aɛ̯tʁ]', '[ɛt(ɾ)]']}

More exmaples of different languages can be found at Example Index Wiki Page.

For command line interface, please refer to Command Line Usage Wiki Page.