/lectures

Lecture notes for EC 607

Primary LanguageHTMLMIT LicenseMIT

Data science for economists

Lectures | Details | FAQ | License

Lectures

Note: While I have provided PDF versions of the lectures, they are best viewed in the original HTML format.

  1. Introduction [.html | .pdf | .Rmd]
  2. Version control with Git(Hub) [.html | .pdf | .Rmd]
  3. Learning to love the shell [.html | .pdf | .Rmd]
  4. R language basics [.html | .pdf | .Rmd]
  5. Data wrangling & tidying
  6. Webscraping: (1) Server-side & CSS [.html | .pdf | .Rmd]
  7. Webscraping: (2) Client-side & APIs [.html | .pdf | .Rmd]
  8. Regression analysis in R [.html | .pdf | .Rmd]
  9. Spatial analysis in R [.html | .pdf | .Rmd]
  10. Functions in R: (1) Introductory concepts [.html | [.pdf | [.Rmd]
  11. Functions in R: (2) Advanced concepts [.html | .pdf | .Rmd]
  12. Parallel programming [.html | .pdf | .Rmd]
  13. Docker [.html | .pdf | .Rmd]
  14. Google Compute Engine
  15. HPC (UO Talapas cluster) [Guest lecture]
  16. Databases [.html | .pdf | .Rmd]
  17. Spark [.html | .pdf | .Rmd]
  18. Machine learning
  19. Workflow & project management

Details

This is a graduate course taught by Grant McDermott at the University of Oregon. Here is the course description, taken from the syllabus:

This seminar is targeted at economics PhD students and will introduce you to the modern data science toolkit. While some material will likely overlap with your other quantitative and empirical methods courses, this is not just another econometrics course. Rather, my goal is bring you up to speed on the practical tools and techniques that I feel will most benefit your dissertation work and future research career. This includes seemingly mundane skills, generally excluded from the core graduate curriculum, which are nevertheless essential to any scientific project. We will cover topics like version control (Git) and project management; data acquisition, cleaning and visualization; efficient programming; and tools for big data analysis (e.g. relational databases, cloud computation and machine learning). In short, we will cover things that I wish someone had taught me when I was starting out in graduate school.

Please do read the rest of the syllabus before you go through the lectures. This will detail software requirements and installation, and give you a better sense of the full aims and scope of the course. I also have an "FAQ" section at the end that covers frequently asked questions (or, at least, potentially asked questions). Speaking of which, here follow answers to some questions that are more specifically related to this repo.

FAQ

How do I download this material and keep up to date with any changes?

Please note that this is a work in progress, with new material being added every week.

If you just want to read the lecture slides or HTML notebooks in your browser, then you should simply scroll up to the Lectures quicklinks section at the top of this page. Completed lectures will be hyperlinked as soon as they have been added. Remember to check back in regularly to get any updates. Or, you can watch or star the repo to get notified automatically.

If you actually want to run the analysis and code on your own system (highly recommended), then you will need to download the material to your local machine. The best way to do this is to clone the repo via Git and then pull regularly to get updates. Please take a look at these slides if you are unfamiliar with Git or are unsure how to do any of that. Once that's done, you will find each lecture contained in a numbered folder (e.g. 01-intro). The lectures themselves are written in R Markdown and then exported to HMTL format. Click on the HTML files if you just want to view the slides or notebooks.

I've spotted a mistake or would like to contribute

Please open a new issue. Better yet, please fork the repo and submit an upstream pull request. I'm very grateful for any contributions, but may be slow to respond while this course is still be developed. Similarly, I am unlikely to help with software troubleshooting or conceptual difficulties for non-enrolled students. Others may feel free to jump in, though.

Can I use/adapt your material for a similar course that I'm teaching?

Sure. That's partly why I have made everything publicly available. I only ask two favours. 1) Please let me know (email/Twitter) if you do use material from this course, or have found it useful in other ways. 2) An acknowledgment somewhere in your own syllabus or notes would be much appreciated.

Are you willing to teach a (condensed) version of this course at my institution?

Possibly. Please contact me if you would like to discuss further.

What are you using to produce these lecture slides/notebooks?

All of the lecture material is written in R Markdown. For the slide decks (lectures 1--5) I'm using xaringan. For the notebooks (lecture 6 and onwards), I using my lecturenotes template.

Do you plan to turn these lecture notes into a book?

Yes! Together with my friend and colleague, Ed Rubin, we're slowly porting our combined lecture material to a book under the tentative title: "Data science for economists and other animals".

License

The material in this repository is made available under the MIT license.