/uruk

Cuneiform corpora in Text-Fabric

Primary LanguageJupyter NotebookMIT LicenseMIT

Nino-cunei

sha Software Heritage Archive DOI

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Proto-cuneiform corpora in Text-Fabric

This repo is a research environment for the study of cuneiform tablets. You can run your own programs off-line, and publish your work in online notebooks.

Corpus

This repo contains images and transliterations of Uruk IV-III tablets (4000-3100 BC).

The data is obtained from CDLI, the Cuneiform Digital Library Initiative.

See also about and images

Software

The main processing tool is Text-Fabric. It is instrumental to turn the analysis of ancient data into computing narratives.

The ecosystem is Python and Jupyter notebooks.

Getting started

Start with the tutorial.

This app also contains machinery to deal with ATF transcriptions and CDLI photos and lineart.

Authors

N.B.: Releases of this repo have been archived at Zenodo. Click the DOI badge to be taken to the archive. There you find ways to cite this work.

Permissions

The software in this repo is Open Source, as per the MIT license. The data (text and images of the Uruk tablets) have been derived from CDLI. If you use them, take care to acknowledge them, which you can do by citing this repository by means of its DOI at Zenodo, or its URL at the Software Heritage Archive (see the badges above).

Status

  • 2018-03-07 More pleasant functions to call up imagery. Improvements in docs. Archived at Zenodo.
  • 2018-03-06 Definitive data version 1.0 imported. No errors or diagnostics. Added over 5000 low resolution CDLI photos for tablets. Photos and linearts are always linked to the online data on CDLI.
  • 2018-03-05 Reorganization of functionality, addition of lineart, repos have moved house from Dans-labs to Nino-cunei; tutorials and primers have been split off the data repo, which is now called uruk.
  • 2018-02-27 More work on clustering notebook.
  • 2018-02-27 Work on collocation methods has started in the collocation notebook.
  • 2018-02-26 The tutorial is getting in shape. It is a full tour around the TF-API and most traits of the data in the Uruk corpus.
  • 2018-02-23 The TF data has been rigorously checked. All aspects of the encoding into ATF can be reproduced exactly from the TF source.
  • 2018-02-14 Text-Fabric data generated, but not thoroughly tested. A very basic start tutorial notebook.
  • 2018-02-09 Conversion coding has just started. We only parse supra-line units. We do not yet generate any Text-Fabric data. The sub-line parsing will be the most work.