/tb2ud

Module for the `udapi-python` framework used to convert Perseus' Ancient Greek Dependency Treebanks into Universal Dependencies

Primary LanguagePython

Tb2UD

Convert Ancient Greek treebanks in the main formats supported by Arethusa and Perseids to UD. At the moment, it is designed to work with the treebanks compatible with:

Requirements

  • Python 3.6+
  • the package udapy-python, with some additional scripts to work with the formats of Perseus AGLDT; you can get them from my Udapy_AGLDT, which is automatically installed if you use the requirements.txt

Important: don't forget to add 2 folders to your $PYTHONPATH: tb2UD and tb2UD/tb2ud; for instance, you can do that by:

cd /path/to/tb2UD/
export PYTHONPATH="$(pwd):$(pwd)/tb2ud/"

Or, even better, create and configure a virtual environment (see next paragraph). At this point, you simply have to add a .pth file (e.g. env.pth) in the <ENV>/lib/<PYTHON-VERSION>/site-package folder.

How to set up a virtualenv

If you don't know what a virtual environment is, you'll find a lot of good tutorials online, starting with this one. You may also want to consider virtualenvwrapper, which makes a lot of things easier to manage.

Follow these three steps:

  1. create and activate a virtual environment (Python 3.6+); see the link above, if you don't know how do it.

  2. install the required packages:

pip install -r requirements.txt
  1. create a pth file and enter the full path to the tb2UD and tb2UD/tb2ud folders; see here.

If you have virtualenvwrapper, you also have a add2virtualenv script, which takes care of step 3 for you:

add2virtualenv directory1 directory2 ...

How to use it

In the scripts folders, you'll find a few bash scripts to perform some of the most frequently used commands.

You can test that everything is working fine by running the following script:

# test.sh <input-file.xml>
cd test # go to the tb2ud/test folder
./test.sh data/hdt-1-20-39-bu2.xml

(note that the script attempts to read an AGLDT XML file; it fails if the appropriate udapi blocks are not found)

If all goes well, you'll see a series of log entries, followed by the good old Hello, World! string.