Coding for Data Analysis with Stata

Introduction to Data Analysis with Stata - lecture materials by László Tõkés (CUB) with Ágoston Reguly (Georgia Tech) and Gábor Békés (CEU, KRTK, CEPR)

This course material is a supplement to Data Analysis for Business, Economics, and Policy by Gábor Békés (CEU) and Gábor Kézdi (U. Michigan), Cambridge University Press, 2021.

Textbook information: see the textbook's website gabors-data-analysis.com or visit Cambridge University Press

To get a copy: Inspection copy for instructors or buy from Amazon or order online around the globe

Acknowledgments

We thank CEU Department of Econimics and Business for financial support.

Status

This is version 1.0. (2022-10-03)

Comments are really welcome in email or as a GitHub issue.

About this lecture series

This series of lectures offers a brief introduction to Stata, containing 13+1 lectures, including a summary lecture. The course serves as an introduction to the Stata programming language and software environment for data exploration, data wrangling, data analysis, and visualization. The structure tries to follow the structure of the textbook, although there are of course some differences: the main organization principle of the lectures is the logic of Stata, not necessary the logic of the book. After going through the lectures, students will be able to reproduce the results of the first two parts of the textbook (Data Exploration, and Regression Analysis) in Stata. Moreover, they will hopefully also understand the language of Stata enough to be able to go on in the textbook, and do the exercises in the second two parts on their own.

Note that in the lectures I use Stata 14, however, all the elements discussed here are compatible forward (and in most cases backward) as well.

Lectures 1 to 11 - complementing Part I: Data Exploration (Chapter 1-6) - focus the logic of the Stata language, data preparation and wrangling, exploratory data analysis, and hypothesis testing. Please note that the first lecture is boring, but unfortunately unavoidable. I tried to be as brief as possible there.

Lecture 12 to 14 - complementing PART II: Regression Analysis (Chapter 7-12) - focus on the basics of regression analysis, the presentation of regression results, and visualization.

Teaching philosophy

We believe in learning by doing, so although the lectures offer a detailed introduction to the topic with many explanations and examples, the more important part is the homework assignments that can help students practicing. We also recommend students to deal with the data exercises at the end of the chapters of the textbook.

This is not a hardcore coding course, but a course to supplement the material of the textbook. The lectures focus on the commands that are needed to reproduce the case studies and to solve the data exercises of the textbook.

The structure of the material reflects these principles. On one hand, the lectures include pre-written codes as an introduction to the topic, while, on the other hand, homework assignments and data exercises of the textbook can help students to gain experience in coding. In most cases, pre-written codes and homework assignments reproduce case study results that can be found in the textbook.

How to use

These lectures can serve as a basis for a course on Stata programming for data wrangling and basic regression analysis. Although, the series is structured and comprehensive enough to be able to stand alone, we recommend to teach (or and learn) it hand in hand with the textbook, since almost all examples are from the textbook.

This series of lectures does not need any prior knowledge in Stata programming.

Sources

The material is based on experience coming from years of teaching coding and empirical courses at Corvinus University of Budapest, being a research assistant and later researcher, and of course advice from many great resources such as

and many others, listed in the lecture's READMEs.

Lectures, contents, and case-studies

The following table shows a brief summary of the lectures: what is the type of the lecture, what is the expected learning outcome, and how it relates to the textbook's case studies and datasets.

Lecture Content Case-study (at least partly) covered Dataset
PART I.
lecture01-boring_stuff Introduction to the Stata interface and communication. Basics of .do files and the logic of syntaxes. - -
lecture02-open_save Opening and saving datasets. - football, hotels-vienna, wms
lecture03-preparation Basics of data wrangling Chapter 01, 1.A1: Finding a Good Deal among Hotels: Data Collection, Chapter 02, 2.A1: Finding a Good Deal among Hotels: Data Preparation hotels-vienna
lecture04-reshape Reshaping multi-dimensional data. Wide and long formats. Chapter 02, 2.B1: Displaying Immunization Rates across Countries worldbank-immunization
lecture05-eda Exploratory data analysis. Chapter 03, 3.A1 and 3.A2: Finding a Good Deal among Hotels: Data Exploration, Chapter 03, 3.B1: Comparing Hotel Prices in Europe: Vienna vs. London hotels-vienna
lecture06-subsamples Dealing with subsamples using the if condition, the in range, and the bysort prefix. Chapter 03, 3.A1 and 3.A2: Finding a Good Deal among Hotels: Data Exploration, Chapter 03, 3.B1: Comparing Hotel Prices in Europe: Vienna vs. London hotels-vienna
lecture07-graphs Making graphs. Chapter 7, 7.A1 and 7.A2: Finding a good deal among hotels with simple regression hotels-vienna
lecture08-moredatasets Combining datasets: adding observations (append) and variables (merge). Chapter 02, 2.C1: Identifying Successful Football Managers football
lecture09-datamanipulation Manipulating data: Producing new variables and changing existing ones. Deleting variables and observations. Chapter 04, 4.A1: Management Quality and Firms Size: Describing Patterns of Association wms
lecture10-macro_loop Working with local and global macros, applying loops, and using stored results. - wms, football
lecture11-hypothesis_testing Testing hypothesis. Chapter 06, 6.A1, 6.A2, and 6.A3: Comparing online and offline prices: testing the difference billion-prices.dta
PART II.
lecture12-regression_basics Basics of regressions: fitting, predicting, dummy variables and interaction terms. Chapter 07, 7.A1, 7.A2, and 7.A3: Finding a good deal among hotels with simple regression hotels-vienna
lecture13-presenting_regresults Presenting regression results nicely and compactly. Chapter 10, 10.A1: Understanding the gender difference in earnings cps-earnings
lecture14-TSdata Basics of time series data commands. Chapter 12, 12.A1: Returns on a company stock and market returns sp500

Found an error or have a suggestion?

Awesome, we know there are errors and bugs. Or just much better ways to do a procedure.

To make a suggestion, please open a GitHub issue here with a title containing the case study name. You may also contact us directly.