/HisTEI

A framework for Oxygen XML Editor allowing researchers to transcribe historical documents in TEI

Primary LanguageXQueryMIT LicenseMIT

HisTEI

A Framework add-on for Oxygen XML Editor allowing researchers to transcribe historical documents in TEI. More information on http://www.histei.info/p/home.html.

Compilation

Compile the project using IntelliJ. Make sure to update the build.properties file with the correct locations of the various modules.

JDK

Preferred JDK is Oracle. Installation:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

Then make sure java -version outputs something along the lines of:

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

Python scripts

In the folder python you'll find two Python-scripts:

  • extractglosses.py allows you to extract all elements from your HisTEI-XML file (requires lxml).
  • xmltokenize.py allows you to to train a sentence tokenizer (uses the NLTK platform).