TAPI Topic Models

Notebooks for the NEH TAPI Workshop, "How to Do Things with Topic Models."

Code

The code for these workshops are found in a collection of Jupyter Notebooks hosted on the Constellete Binder hub. To access them, click on the button below:

Corpus Data

Corpus data files are CSV files in a specific format — machine learning corpus format. Files in this format are essentially files with three columns: (1) a document identity, called doc_key, (2) a document label, called doc_label, and (3) the document content, called doc_content. Each row contains a complete "document," defined as an analytically useful unit of discourse, such as a paragraph or chapter.

A collection of them have been prepared for this workshop. Download them and then upload them to the ./corpora folder in your Binder repository. Sorry that the process is not more direct!

Corpus data may be downloaded from the following shared Dropbox link:

TAPI Corpora Directory

Additionally, these files may be downloaded individually:

Wine Reviews — A collection of terse wine reviews.
JSTOR Hyperparameter — Abstracts from a JSTOR search for "hyperparameter."
Tamilnet — A sample of news stories from the website Tamilnet.
Anphoblach — A sample of news stories from the website Anphoblacht.

Each link goes to a Dropbox item that has a download link. Download the file to your desktop and then upload to the appropriate directory.

Sample Output Data

The notebooks in this workshop will generate a digital analytical edition from a given source corpus file. The results of the various analytical processes will be put in the ./db directory. A demonstration edition is provided for one of the notebooks. To get the demo data, download then upload the files with the prefix jstor_hyperparameter_demo into to your ./db directory.

TAPI Editions Directory

kspicer80/TAPI_Topic_Models

TAPI Topic Models

Code

Corpus Data

Sample Output Data

Workshop Slides