/ORACC-download

Code to download a dataset and filter it according to its metadata

Primary LanguageJupyter Notebook

ORACC-download

This code allows you to download data from the Open Richly Annotated Cuneiform Corpus (ORACC).

This project is code that I have remixed from Niek Veldhuis' Computational Assyriology (Compass) project. I have not written the original code, but I did put two part of his code together and add filters in order to produce a dataset from ORACC I required for my project.

In the same folder you download this file, you need a folder called "output". This folder will be where your saved data will be saved to.

The result of the code is a table where every line represents a text, and every word is represented as a lemma. The lemmas are represented as lemma[guideword]POS. Data for this is taken from ORACC, where you can find more information regarding these terms. The texts included are filtered according to their metadata, which is also provided by ORACC.

The code is full of comments, but if you have any issues please contact me eleanor.bennett@helsinki.fi.

Licensing

You are free to use, reuse, and remix this code, but please credit myself (Ellie Bennett) and Niek Veldhuis.

(CC BY 4.0)