Introduction to Data Analysis with Stata - lecture materials by László Tõkés (CUB) with Ágoston Reguly (Georgia Tech) and Gábor Békés (CEU, KRTK, CEPR)
This course material is a supplement to Data Analysis for Business, Economics, and Policy by Gábor Békés (CEU) and Gábor Kézdi (U. Michigan), Cambridge University Press, 2021.
Textbook information: see the textbook's website gabors-data-analysis.com or visit Cambridge University Press
To get a copy: Inspection copy for instructors or buy from Amazon or order online around the globe
We thank CEU Department of Econimics and Business for financial support.
This is version 1.0. (2022-10-03)
Comments are really welcome in email or as a GitHub issue.
This series of lectures offers a brief introduction to Stata, containing 13+1 lectures, including a summary lecture. The course serves as an introduction to the Stata programming language and software environment for data exploration, data wrangling, data analysis, and visualization. The structure tries to follow the structure of the textbook, although there are of course some differences: the main organization principle of the lectures is the logic of Stata, not necessary the logic of the book. After going through the lectures, students will be able to reproduce the results of the first two parts of the textbook (Data Exploration, and Regression Analysis) in Stata. Moreover, they will hopefully also understand the language of Stata enough to be able to go on in the textbook, and do the exercises in the second two parts on their own.
Note that in the lectures I use Stata 14, however, all the elements discussed here are compatible forward (and in most cases backward) as well.
Lectures 1 to 11 - complementing Part I: Data Exploration (Chapter 1-6) - focus the logic of the Stata language, data preparation and wrangling, exploratory data analysis, and hypothesis testing. Please note that the first lecture is boring, but unfortunately unavoidable. I tried to be as brief as possible there.
Lecture 12 to 14 - complementing PART II: Regression Analysis (Chapter 7-12) - focus on the basics of regression analysis, the presentation of regression results, and visualization.
We believe in learning by doing, so although the lectures offer a detailed introduction to the topic with many explanations and examples, the more important part is the homework assignments that can help students practicing. We also recommend students to deal with the data exercises at the end of the chapters of the textbook.
This is not a hardcore coding course, but a course to supplement the material of the textbook. The lectures focus on the commands that are needed to reproduce the case studies and to solve the data exercises of the textbook.
The structure of the material reflects these principles. On one hand, the lectures include pre-written codes as an introduction to the topic, while, on the other hand, homework assignments and data exercises of the textbook can help students to gain experience in coding. In most cases, pre-written codes and homework assignments reproduce case study results that can be found in the textbook.
These lectures can serve as a basis for a course on Stata programming for data wrangling and basic regression analysis. Although, the series is structured and comprehensive enough to be able to stand alone, we recommend to teach (or and learn) it hand in hand with the textbook, since almost all examples are from the textbook.
This series of lectures does not need any prior knowledge in Stata programming.
The material is based on experience coming from years of teaching coding and empirical courses at Corvinus University of Budapest, being a research assistant and later researcher, and of course advice from many great resources such as
- Getting Started with Stata for Windows by Stata Press
- Economics Lesson with Stata by Data Carpentry
- UCLA's Stata Learning Modules
- Kurt Schmidheiny's brief intro document
- Fundamentals of data analysis and visualization from a group of instructors
- A huge collection of advanced Stata stuff on the Medium site
- A great online training by SSCC
- A four-piece tutorial by Germán Rodríguez from Princeton University
and many others, listed in the lecture's READMEs.
The following table shows a brief summary of the lectures: what is the type of the lecture, what is the expected learning outcome, and how it relates to the textbook's case studies and datasets.
Awesome, we know there are errors and bugs. Or just much better ways to do a procedure.
To make a suggestion, please open a GitHub issue
here with a title containing the case study name. You may also contact us directly.