Code for creating topic modeling curation files, for annotators to work with topic modeling output in order to label and describe the topics (thus creating human-usable codes, useful for content analysis).

To run the scripts, first set up a conda environment and install the required libraries, run: conda env create --file environment.yml, and then activate the conda environment using conda activate topic_curation.

The main base script is, and generated output files that would be used for curation are in example_data/outputs/.

To add custom columns for getting annotations and ratings other than just getting topic labelings, the code in can be modified and repurposed according to the use case (example output is in example_data/outputs_with_custom_col_in_topic_word_file).

NOTE: To run the scripts with the example data in example_data/, a raw_documents.txt (with one document text per line) must be present (not there because the one that goes with rest of the files is too big).


  • Further documentation.
  • More command line options and flexibility.
  • Custom ratings columns to be provided as an input to the script, likely by having the names and values for those columns in a file.