/computational-grounded-theory

Code to reproduce analysis done in the article Computational Grounded Theory: A Methodological Framework

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Computational Grounded Theory

This code replicates the analysis done in the article Computational Grounded Theory: A Methodological Framework in Sociological Methods and Research (online publication November, 2017).

01-Step1-PatternDetection replicates Tables 3-4
02-Step2-PatternRefinement replicates Table 5
03-Step3-PatternConfirmation replicates the analysis used in Step 3 of the paper

To run this code you will need:

  • Python 3.5.2, and the Python libraries sklearn, pandas, nltk, numpy, scipy, matplotlib, and json
  • R 3.3.1, and the R library stm
  • Jupyter, and the rpy2 extention rpy2.interactive.ipython

**** See the requirements.txt file for specific versions of the libraries used ****

The data file (in the data folder) contains text that is sorted alphabetically, as some of it is still under copyright. As such, the code and dataset in this repository replicates around 90% of the complete analysis. The final 10%, in particular the Named Entity Recognition calculation as well as Step 2 - the guided deep reading step - requires the unsorted text and is thus not fully replicable/reproducible with the data presented here.

You may email me (L dot Nelson at northeastern dot edu) for access to the complete, unsorted, data, or with any questions/comments.