This is a first version of the Wikimedia project etytree. The aim of the project is to visualize in an interactive web page the etymological tree (i.e., the etymology of a word in the form of a tree, with ancestors, cognate words, derived words, etc.) of any word in any language using data extracted from Wiktionary.
This project has been inspired by my interest in etymology, in open source collaborative projects and in interactive visualizations.
The code and the data are distributed under Creative Commons Attribution-ShareAlike 3.0
Files contained in resources/data are imported from Wiktionary and updated when a new dump of the English Wiktionary is generated.
This code queries the wmflabs etytree-virtuoso sparql endpoint which I have set up and populated with data (RDF) produced with dbnary_etymology. The extracted data is kept in sync with Wiktionary each time a new dump is generated (we are a little behind now - data was extracted on 12/20/2016).
I have defined an ontology for etymologies here. In particular I have defined properties etymologicallyDerivesFrom, derivesFrom and descendsFrom (and also etymologicallyEquivalentTo) as subproperties of etymologicallyRelatedTo. All these properties are transitive, etymologicallyEquivalentTo is reflexive.
Besides etymological relationships data also contain POS-s, definitions, senses and more as extracted by dbnary. The ontology for dbnary is defined here.
An example query to the sparql endpoint follows:
PREFIX eng: <>
eng:__ee_get ?p ?o
Property is used to link to the Wiktionary page the etymological entry was extracted from. If you want to find all entries containing string "door":
?s rdfs:label ?label .
?label bif:contains "door" .
If you want to find ancestors of "door":
define input:inference "etymology_ontology"
PREFIX dbetym: <>
PREFIX eng: <>
eng:__ee_1_door dbetym:etymologicallyRelatedTo{1,} ?o .
The RDF database of etymological relationships is periodically extracted when a new dump of the English Wiktionary is released. The code used to extract the data is dbnary_etymology.
dbnary_etymology is a Maven project
cd dbnary_etymology/extractor/
mvn site
mvn javadoc:jar
cd dbnary_etymology/ontology
mvn install:install-file -Dfile=target/ontology-1.6-SNAPSHOT.jar -DgroupId=org.getalp.dbnary -DartifactId=ontology -Dversion=1.6-SNAPSHOT -Dpackaging=jar -DgeneratePom=true
cd dbnary_etymology/extractor
mvn package
rm ${OUT}
java -Xmx24G -cp ${EXEC} org.getalp.dbnary.cli.ExtractWiktionary -l en -x --frompage ${FPAGE} --topage ${TPAGE} -E ${ETY} -o ${OUT} ${DUMP} 3>&1 1>>${LOG} 2>&1
java -Xmx24G -cp ${EXEC} org.getalp.dbnary.cli.GetExtractedSemnet -x -l en --etymology ${DUMP} door
java -Xmx24G -cp $EXEC org.getalp.dbnary.cli.GetExtractedSemnet -l en --etymology $DUMP $WORD
java -Xmx24G -cp $EXEC org.getalp.dbnary.cli.GetExtractedSemnet -x -l en --etymology $DUMP $WORD
I would like to add a preferred direction to the graph, that goes from left to right following the evolution of a word from the past to the present. This would mean in terms of force field to add a magnetic field that orients arrows towards a preferred direction.
Add zoom to tooltip, set zoom also in google chrome and other browsers.
Add etymology controversies.
Currently for some words the Virtuoso server doesn't return data because it reaches timeout. I want to try a different query like the following
DEFINE input:inference "etymology_ontology"
PREFIX dbetym: <>
PREFIX owl: <>
PREFIX rdfs: <>
SELECT DISTINCT ?source ?p ?o ?cognate ?pcognate ?scognate
?source ?p ?o .
FILTER (?p in (dbetym:etymologicallyDerivesFrom, dbetym:descendsFrom, dbetym:derivesFrom,dbetym:etymologicallyEquivalentTo))
# {
# SELECT ?source
# {
# ?source dbetym:etymologicallyRelatedTo{1,} <> .
# }
# }
SELECT ?source
<> dbetym:etymologicallyRelatedTo{1,} ?source .
?source dbetym:etymologicallyRelatedTo{1,} ?cognate .
?scognate ?pcognate ?cognate .
FILTER (?pcognate in (dbetym:etymologicallyDerivesFrom, dbetym:descendsFrom, dbetym:derivesFrom,dbetym:etymologicallyEquivalentTo))
Click on a word and interrogate the server to get data about the word.
Search words with space or with accent
Extract Reconstructed words.
Maybe consider Dialects:
Module:da:Dialects ?
Module:en:Dialects This module provides labels to {{alter}}, which is used in the Alternative forms section.
Module:grc:Dialects This module translates from dialect codes to dialect names for templates such as {{alter}}. (e.g. aio -> link = 'Aeolic Greek', display = 'Aeolic')
Module:hy:Dialects ?
Module:la:Dialects (e.g.: aug -> link = Late Latin#Late and post-classical Latin, display = post-Augustan)
- Maybe consider additional modules:
Module:families/data mapping language code -> language name (e.g.: aav -> canonicalName = "Austro-Asiatic",otherNames = {"Austroasiatic"}