
Working with public datasets

Primary LanguageJupyter NotebookMIT LicenseMIT


This repository will contain code for working with publicly available datasets. The intention is to help students seeking to write an empirical thesis in getting started with empirical work on some good datasets.

External recommendations

  • A wiki with tons of papers that replicate (and therefore include data), categorized by topic or the methods in use.
  • A database with "automatic stata reproductions".


A short list of datasets that I eventually want to have code for in this register.

  • Danish Motor Register: Code written by Sebastian Dyrby when I supervised his master's thesis. The data contains all Danish cars including the odometer (total driving).
  • NBER Industry manufacturing database NBER: industry-level panel (not firm-level).
  • Spectrum Auctions: Data is e.g. avilable at Penn or at FCC.
  • Danish pharmaceutical prices at medicinpriser.dk.
  • Procurement auctions in Europe: data.europe.eu. Browse around the site, there's quite a lot of data. Typically, there's a final price and some notion of an "expected price" (or a value). The Danish Consumer and Competition Authority also maintains a dataset which contains the Danish auctions only and where they add some additional variables and do quality control. kfst.dk/udbud/data-og-cases/udbudsdata/.