Data Science Portfolio/Projects


Books

  • Econometrics (Hanson 2018) - Great introduction to graduate econometrics [pdf]

  • Econometric Analysis of Cross Section and Panel Data, Second Edition (Wooldridge 2010) - Standard reference that should be on every shelf [Book Description]

  • Regression Modeling Strategies (Harrell 2001) - The first three chapters are required reading -- Frank Harrell knows his statistics. [Book Description]

  • Applied Nonparametric Econometrics (Henderson and Parmeter 2015) - Start to finish nonparametric econometrics with applications and R code [Book Website] [Personal Bookdown Notes]

  • Introduction to Statistical Learning (James et al. 2017) - Perfect introduction to statistical learning and predictions [Book Website] [pdf] [Personal Notes] [Python Code]

  • (In Progress) Fluent Python (Ramalho 2015) - [Book Website]

  • (In Progress) The Elements of Statistical Learning (Hastie et al. 2009) - [Book Website] [pdf]

  • (In Progress) Hands on Machine Learning with Scikit-Learn and TensorFlow (Geron 2017) - [Book Description] [Personal Notes] [Github]

Courses


I find the best way to learn a specific algorithm or statistical model is to build one from scratch. The following files are classes and functions that accomplish the most common statistical learning methods on a limited level.

  • Keywords(R, Python, Statistical Modeling, Algorithms)
  • Builds daily gridded weather data for the continental United States from 1900-2013.

  • Relative anomaly spline interpolation technique calculates daily weather data for 460,000 2.5km x 2.5km grids in the US. [Tech. Example]

  • Aggregates down to county level weather data.

  • Keywords(R, Economics, Climate Change, Weather)


Nonlinear Temperature Distributions [R package] [Python Package]

  • Calcuate nonlinear temperature distributions degree days and time in each degree.

  • Measure accounts for the rise and fall of temperatures during the day.

  • Degree days define time above a specified temperature threshold (e.g. degree days above 30C) and time in each degree define time within a specified temperature threshold (e.g. time in 30C).

  • Keywords(R, Python, Economics, Climate Change, Agronomy)


  • Predict wine quality based on biophysical characteristics.

  • Model using Multinomial logit, Linear Discriminant Analysis, Random Forest, and Extreme Gradient Boosting

  • Keywords(R, Classification, Economics)