/wdp

A Python library for uploading to Wiktionary

Primary LanguagePythonMIT LicenseMIT

wdp: the Wiktionary Data Preparer

https://img.shields.io/pypi/v/wdp?style=for-the-badge https://img.shields.io/pypi/dm/wdp?style=for-the-badge https://img.shields.io/pypi/l/wdp?style=for-the-badge

wdp (Wiktionary Data Preparer) is a small Python library that can help you get your language data onto Wiktionary. Formatting Wiktionary entries perfectly can be hard, and it's wdp's goal to take care of the tricky stuff for you.

Using the Word API, enter your data:

from wdp import Word

# use the Word class to represent our words
apple = Word("apple")
apple.add_pronunciation("/ˈæp.əl/", notation="IPA")
apple.add_definition("A common, round fruit", "Noun")
apple.add_definition("A tree of the genus Malus", "Noun")
apple.set_etymology("Old English æppel < Proto-Germanic *ap(a)laz < PIE *ab(e)l-")

pear = Word("pear")
# ...

# put all our words in a list
wdp_words = [apple, pear, ...]

Use the format_entries function with your list of Word objects to produce Wiktionary markup:

from wdp import format_entries

# Generate Wiktionary markup from our entries
formatted_entries = format_entries(wdp_words, "en", "English")
# Produces an entry like the following:
"""
==English==

===Etymology===
Old English æppel < Proto-Germanic *ap(a)laz < PIE *ab(e)l-

===Noun===
{{head|en|noun}}

# A common, round fruit
# A tree of the genus Malus
"""

Perform the upload:

from wdp.upload import upload_formatted_entries
upload_formatted_entries(formatted_entries, "English")

(Note: wdp requires Python 3.6 or higher. If you do not have a Python installation, we recommend that you use Anaconda.)

pip install wdp

To use wdp, you will need to have your data available in a machine-readable format. The format does not matter, but you will need to be able to read it and turn it into a list of Word objects.

As in the example above, you will need to build a list of Word objects. A single Word object is defined by its canonical form. It is OK for two or more words to have the same form--this might happen when two words are homonyms, or when they have separate etymologies.

from wdp import Word
bank_1 = Word("bank")
bank_1.add_definition("A place where people keep their money", "Noun")

bank_2 = Word("bank")
bank_2.add_definition("The edges of a river", "Noun")

Methods of the Word class which begin with add_ can be invoked multiple times (because e.g. a word can have many definitions), but methods which begin with set_ should only be called once (because e.g. you should only have one etymological note).

Consult the Word class's documentation for a complete description of its methods. Currently, the following methods are available:

  • add_definition
  • add_alternative_form
  • add_pronunciation
  • set_etymology
  • set_description
  • set_references
  • set_usage_notes
  • set_conjugation
  • set_declension
  • set_inflection

For more information on how to use these methods, see Wiktionary's entry layout guidelines.

Once you have constructed your list of words, they are ready to be uploaded. Uploading to Wiktionary is a bit complicated, so we recommend that you export your data so someone else can upload it. You can do this by using the export_words function:

from wdp import export_words
my_english_words = [bank_1, bank_2]
export_words(my_english_words, 'my_english_words.zip')

Once you've done this, please email it to Luke Gessler (lg876@georgetown.edu) or Aryaman Arora (aa2190@georgetown.edu) so we can help you perform your upload.

Section under construction

First, you will need to create an account on Wiktionary.

Next, in your working directory, create a user-config.py file with the following contents:

family = "wiktionary"
mylang = "en"

usernames["wiktionary"]["en"] = u"Ldgessler"  # change to your username

console_encoding = "utf-8"

minthrottle = 0
maxthrottle = 1

In your main Python file, you can now use wdp.upload.upload_formatted_entries to perform your upload:

# load your list of Words
from wdp.upload import upload_formatted_entries
my_english_words = [...]
# or
from wdp import import_words
my_english_words = import_words('my_english_words.zip')

# format the list of Words into entries
# you will need a language code from here:
# https://en.wiktionary.org/wiki/Wiktionary:List_of_languages
from wdp import format_entries
lang_code = "en"
lang_name = "English"
formatted_entries = format_entries(my_english_words, lang_code, lang_name)

# use the page_prefix argument to upload the data to your personal pages
# first for debugging, e.g. User:Ldgessler/chafe
upload_formatted_entries(formatted_entries, lang_name, page_prefix="User:Ldgessler/")

# Once you are CERTAIN your data is correct, you may remove the page_prefix
# argument to perform the upload for real:
upload_formatted_entries(formatted_entries, lang_name)

Not on your own, but please open an issue on our GitHub page explaining what your data looks like, and someone may be available to help you.

Yes, WDP is agnostic as to the source format of your data.

In the future, we may add support for popular formats (like FLEx dictionary XML) to allow you to upload from them without writing any code. If there is a format you'd like us to support, please open an issue.

A new one can easily be created, but you will need to consult with an expert. Contact Aryaman Arora (aa2190@georgetown.edu) or a Wiktionary admin.

Not currently, but this is a feature we'd like to support if there's demand for it. Please open an issue if you would like this functionality.