This code is part of the paper:

Arts S, Cassiman B, Hou J. (2023). Position and Differentiation of Firms in Technology Space. Management Science 69 (12): 7253-7265.

All data is available from

If you use the code or data, please cite the paper above.

We include three batches of code:

  • The first batch (tfidf.R) shows how to use tfidf to characterize a firm's technology portfolio and calculate the similarity between firm technology portfolios
  • The second batch ( shows how to train doc2vec to create patent level embeddings
  • The third batch (doc2vec.R) shows how to use doc2vec to characterize a firm's technology portfolio and calculate the similarity between firm technology portfolios

It is suggested to run the code locally and creating an additional folder called ./data/ in which the data from the Zenodo repository can be downloaded.