/TAPI_Topic_Models

Notebooks for the NEH TAPI Workshop, "How to Do Things with Topic Models."

Primary LanguageJupyter NotebookMIT LicenseMIT

TAPI Topic Models

Notebooks for the NEH TAPI Workshop, "How to Do Things with Topic Models."

Code

The code for these workshops are found in a collection of Jupyter Notebooks hosted on the Constellete Binder hub. To access them, click on the button below:

Binder

Corpus Data

Corpus data files are CSV files in a specific format — machine learning corpus format. Files in this format are essentially files with three columns: (1) a document identity, called doc_key, (2) a document label, called doc_label, and (3) the document content, called doc_content. Each row contains a complete "document," defined as an analytically useful unit of discourse, such as a paragraph or chapter.

A collection of them have been prepared for this workshop. Download them and then upload them to the ./corpora folder in your Binder repository. Sorry that the process is not more direct!

Corpus data may be downloaded from the following shared Dropbox link:

Additionally, these files may be downloaded individually:

  • Wine Reviews — A collection of terse wine reviews.
  • JSTOR Hyperparameter — Abstracts from a JSTOR search for "hyperparameter."
  • Tamilnet — A sample of news stories from the website Tamilnet.
  • Anphoblach — A sample of news stories from the website Anphoblacht.

Each link goes to a Dropbox item that has a download link. Download the file to your desktop and then upload to the appropriate directory.

Sample Output Data

The notebooks in this workshop will generate a digital analytical edition from a given source corpus file. The results of the various analytical processes will be put in the ./db directory. A demonstration edition is provided for one of the notebooks. To get the demo data, download then upload the files with the prefix jstor_hyperparameter_demo into to your ./db directory.

Workshop Slides