Measuring Founding Strategy

This repo is the main replication website for the paper "Measuring Founding Strategy", by Jorge Guzman [www.jorgeguzman.co] and Aishen Li. This paper used the Python Doc2Vec library and the Wayback Machine to measure the differentation between startups and incumbents at the time of founding. The repo is split into four folders.

  • crawler/ is the code for crawling the Wayback Machine.
  • download/ are scripts to activate the crawler and download all files.
  • text_analysis/ estimates similarity across firms using doc2vec.
  • utils/ is a series of programs to run the text analysis.

The repo focuses on the Python code to build the dataset.

The underlying data, trained doc2vec models, and the Stata code replicating the regressions of the paper are be available in the Harvard Dataverse.

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/NRYTAA

Authors

Jorge Guzman. Assistant Professor. Columbia Business School and Data Science Institute. Email: jag2367@gsb.columbia.edu

Aishen Li. Doctoral Student. Tsinghua University. Email: las21@mails.tsinghua.edu.cn