Java API to locally access data from Wiktionary, a collaboratively-edited, free dictionary. Specific target is the French Wiktionary.
Offer a complete graph database and a Java API to access it that provides the following information about a word:
- Definitions and usage examples.
- Semantical relations with other words:
- Lexical class (noun, adjective, verb…).
- Pronunciation.
import edu.unice.polytech.kis.semwiktionary.model.Word;
Word hello = Word.find("bonjour"); // database lookup
for (Word salutation : hello.getSynonyms()) {
println(salutation + " world!"); // all variants of “hello world!”…
print("Most usually used in the context of: "); // …with the domain (usage context, e.g. “sociology”)…
println(hello.getDefinitions().get(0).getDomain()); // …of their most common meaning
}
Remember that we are currently offering support only for the French Wiktionary. This software has not been tested with any other language. You are most welcome to try and contribute support for other languages, though!
Download the latest build from the downloads page.
All necessary dependencies are in the lib
folder, and the API itself is available as a JAR at the archive's root.
This is clearly the preferred method, as it will allow you to skip the burden of parsing the Wiktionary yourself. As long as our servers can handle the load, you can download the full French Wiktionary database as an archive. Make sure to download the same database version as your access library version.
You will then have to move the contents of the archive in a data
folder in the deflated API archive, in order to get the following file hierarchy:
┲SemWiktionary (deflated API archive)
├ SemWiktionary.jar
├ wiktionary
├┬ lib
┋ (…many jars…)
├┬ data (deflated database archive)
┋ (…many "neostore" files…)
For testing or a basic usage, you can simply use the lookup interface this way:
./wiktionary # interactive, or:
./wiktionary [wordToLookUp [anotherWord [...]]]
To integrate SemWiktionary within your own application, or export the data in any format you wish, use the provided API. Its documentation is available in the doc/javadoc
folder of the archive. You will need to include the SemWiktionary JAR and and all those in the lib
folder, and provide a data
folder containing the database at the root of your project folder.
If you are interested in modifying the parser, generate your own database and so on, download the source and read doc/Parser/How to parse a dump file.md
.
- JWKTL. Not documented, source code access was not allowed by authors.
Several tools parse MediaWiki markup and create an AST from it. However, most of them are both overkill and not specific enough for the Wiktionary dialects (much more structured than Wikipedia, for which most tools are tailored).
Contact authors for a different licensing request.
- Matti Schneider-Ghibaudo
- Fabien Brossier
- Dong Thinh
- Michel Gautero
- Carine Fédèle
- Neo4j graph database (GPL)
- JFlex Java lexer by Gerwin Klein (GPL)
- Markdown doclet documentation parser by Richard Nichols (GPLv2)
- JUnit unit-testing framework (CPL)
- Unitils extensions for JUnit (Apache 2)
- Gwtwiki converter wiki text to plain text (EPLv1)
- Javadoc stylesheet by the Apache Software Foundation.