/Causal-2-Pedagogy

This repository aims to organize all the materials and teaching resources for PH243A, Advanced Topics in Causal Inference

Primary LanguageTeX

Causal-2-Pedagogy

This repository aims to organize all the materials and teaching resources for PH243A, Advanced Topics in Causal Inference. The main goal is to create a repository of resources for graduate student instructors (GSIs) that has version control such that as labs, simulations, and other resources change between GSIs, there is a running history that can track these changes. Overall, this repository aims to provide information that can be used by future GSIs to incrementally make PH243A use more of the teaching methods detailed in the book Building Thinking Classrooms in Mathematics. Because of the amount of material covered in the course, it is difficult to use pedagogical methods other than simply reading directly from the labs and having students do problems individually. This readme gives an overview of each lab taught by the GSI and gives suggestions as to how each lab can be changed to encourage more thinking in the classroom, collective learning, discussions of equity and inclusions, and more engagement with notation.

Lab 0

Lab Restructuring Historically, this lab has been a R programming refresher. This can be useful for certain students who do not have R experience, however, it seems that what is more important is to have students work through this lab individually and use class time to review notation that is used in the class. This is particularly important to establish a foundation for the advanced material taught in the course. Experience teaching the course shows that midway into the curriculum there is a division between students who can continue to understand more complicated estimators and those who get stuck. As such, this first lab time should be used to review a glossary of notation used in the Targeted Learning books. For example, students have confusion around the difference between P_0 and P_{U,X} - that is, P_0 = Function(P_{U,X}, \bar{A}), so that you can think of the function as a one to many mapping and that P_0 is a coarsening of P_{U,X}. Or more simply, that P_0 is an instance of P_{U,X}. It is important to connect this point for students when simulating data using structural equations to create data generating systems.

My suggestion for this section is to have students read the first three chapters of the Causal Inference for Complex Longitudinal Studies, starting with abbreviations and notation. Before class, have a worksheet prepared that has notation used in the class with no definition. Break students into groups and have them use the white board to define each notational principle, such as "Define P_0", of which the students would discuss together in their group and write on the white board, "P_0" is the true data-generating distribution; 0~P_0. Additionally, it may help students connect these conceptual principles practically by asking for an example from simulations. Such as, "Define P_0" and "When we simulate a data-generating system, what is P_0 in this context?". A simple worksheet should be made with 15-20 of these questions. I suggest randomly spliting students into groups of 3-4 and giving each student space on the vertical white board to write out the notation and definitions.

Pedagogy principles: Getting students together and engaged in a more transparent environment (vertical white board) avoids stalling and faking behavior. It also gets students to engage more actively with notation rather than passively reading it or writing it in Latex. The main goal of this first section is to create an activity that sets all students on the same path related to notation that establishes a trajectory that will help students in the future be in a "Flow": a state between frustration where activity is too hard for them and boredom where the activity is too easy for them.

Anecdotally, I have found that students midway through the course do not know the fundamental notation and so the description of the estimators and questions related to writing out the notation of an estimator is too hard for them. Therefore, this first class should establish a strong foundation in notation. This should be the inaugural section because most students are scared of notation and avoid it and do not study it independently at home. Conversely, the original lab was on R programming basics can easily be studied at home by students and most students already have some programming experience and can easily find resources online. Reading the first 3 chapters (32 pages) and doing the programming refresher should not take students more than 4 hours. Overall, the principles aimed for in this restructuring are collaborative thinking and helping build a foundation for students that will increase the likelihood of being in a flow state later in the class.

Goal: Engage students with statistical notation used in the course in a more confident and active way. Get students to work with each other collaboratively to establish a notational foundation.

To-do: Write the worksheet that will be used for Lab 0. Write a simple R script for a random group generator that will be used in each class.

Visual Aids: I have provided the jpegs for a GIF I created to help students understand /mathbb{M} - or the non-parametric statistical model and how P_0, P_{\epsilon}, and P_{U,X} are related to it. This is an important resource for students to think about what perturbs empirical data away from the true target parameter P_0. For example, in the setting for the average treatment effect, because our counterfactuals are on A, the treatment, what pushes our parameter away from the true ATE is due to the propensity score (A|W). Here, it is important for students to see that, if we had a true randomized control trial, P_n and P_0 would be the same - it's the difference in likelihood for receiving treatment/exposure given covariates that moves us away from the true target estimand. Therefore, we can use estimation of this nuisance parameter to fluctuate our initial estimates in the direction of the truth. This visual should be used at the end of class to tie together the notational discussion students did and link that notation to targeted learning in the point-treatment setting. This is the first image in the GIF that wraps together the notation discussed into fluctuations of the system towards the ATE target parameter:

Non-Parametric Model and the ATE

Lab 1

Lab Restructuring: This lab focuses on simulating longitudinal data. There are two large improvements that can be made to this lab: using a real-world example instead of an arbitrary research question and more guidance on how to simulate data-generating systems. Students are required to simulate simulate data that imitates the functional forms they believe are present in their real-world data project and most, if not all, struggle to do this properly by the end of the course. Students need to investigate how variables are associated, determine a temporal ordering, think about interactions in the data, and, once they have laid out the general structure, begin to program these as sets of equations to simulate from. Therefore, I believe this lab can be restructured to use a simple real-world data example in the beginning that will help students visualize the directed acyclic graph for longitudinal data and also teach them how to think about simulating data from actual data which will be needed at the end of the course.

Similarly, I believe this is an excellent opportunity to engage students on how statistics can be used to investigate/estimate issues around social inequities and how racism is a public health crisis. As such, I propose using existing data from Center for the Study of Racism, Social Justice, and Health as a representative data source that is more substantive and reflective of the current public health environment in the U.S. (and in the world), rather than the current example which is a simple story of the amount of sleep students get each day before a statistics exam and the amount of sleep students get before a class where they report being sick or not. Although this will require some work in identifying a dataset that works for the class (has multiple $A$ nodes and time varying covariates $\bar{L}$), I think this task is worth the time because it will provide students with a real-world question that feels important - which will motivate students to actually want to create accurate estimators in the future. Using this dataset also feels more meaningful; it will teach students how to transfer real world data in to simulations (we can have students read the study description and variable meta data and they can, in groups, work out the DAGs that would represent the longitudinal data system). This knowledge would help them directly in being able to simulate data based on their final projects of interest.

Generally, I think this class should be restructured such that the first hour is showing this new dataset and describing why it's important, then getting students up and on the white board to think about the data-generating system, and writing out the functional forms more abstractly. In the second hour of class, students will randomly split into pairs and do paired programming to simulate the exogenous and functional forms for the endogenous variables under various situations of censoring and multiple outcomes. I suggest using a real world data example that has both time varying covariates and censoring from the beginning (more complicated system) and then discussing how simpler systems would just lack certain functional forms (we can just remove them).

Pedagogy principles: Using a real-world example that feels meaningful is a way to get students more engaged with the actual research questions. These data-generating systems are used in almost every lab of the course so if it feels like we are answering a pertinent societal question, it will help motivate students to be active with the material. Likewise, getting students up and on the white board together to draw out the DAGs helps avoid students being passive and disengaged. This also helps the students develop skills in simulating data based on real data that is not currently taught but is needed for the final project.

The second part of the lab, paired programming, is first done via visibly assigning random groups - the goal of this is to mobilize empathy. Because the groups are randomized, if one student isn't as good at R programming as another, the more experienced will show more empathy in trying to help the person who is struggling. Paired programming involves a navigator (that focuses on big-picture aspects of the task) and the driver (who is actually programming the immediate sets). This video gives a nice breakdown (by kids!) of how to do paired programming. Why paired programming? I noticed that even when randomly assigning groups of 3-4 students to work on the labs together, invariably the students will just work on the labs together but independently (each person just programming) and they only engage when somebody has a bug/error in their code. By doing paired programming and having roles as navigator and driver, I think this will allow students to engage more with the material as a unit which helps with collaborative learning.

Goal: Motivate students to care about a real-world data problem. Teach students how to simulate data from real world data. Engage students as a group in collaboratively thinking about data-generating systems. Use paired-programming to help students engage with programming together rather than individually.

To-do: This will require re-writing a majority of this lab. We will need to identify a new dataset that is longitudinal and addresses an issue related to racism and social justice from which a simulation can be created for and used during the rest of the class, write sections to address this and give the story, write sections to have students work in groups to draw out the system, and simulate these new data-generating systems but with paired programing. Students can then submit the labs as their paired units, rather than individually.

Visual Aids: No visual aids for this section.

Lab 2

Lab Restructuring This lab focuses on calculating true values of target causal parameters under longitudinal interventions. Now that functions have been made to simulate data, the students can intervene on nodes (such as simulating an outcome if all individuals were exposed/treated vs. not exposed/not treated). In the previous section, we restructured the lab to use a real-world example related to social/health inequity and we gave thinking tasks to the students in how to simulate data from this before going into the actual simulated data provided. We formed visibly random groups for paired programming to establish a navigator and driver role in each group to keep students engaged and we used vertical surfaces as a class to write out the data-generating process for this real-world data example. This lab can be rewritten to follow our new data example but will remain mostly programming. In this lab, it's important to connect that by simulating a large number of observations we are approximating P_{U,X}, that is, we are drawing observations from all the different combinations of values from the joint distribution shown in the Lab 0 figure. Because this lab is all programming involving intervening on the previously made functions and interpreting results, visually random groups for paired programming will keep students engaged.

Currently, the GSI introduces the lab, students understand the goals, and, even when put into groups of three to four, end up working silently together. To establish more engagement, the room should be structured to have smaller tables so that each pair group can sit together and feel like they can discuss the problems (navigator) and the driver can talk through what is being programmed without distracting other groups and to make the pairs feel more comfortable communicating. This defronts the classroom. This lab is generally easy for those that are skilled in programming but harder, of course, for those that are not as familiar. It is important to stress that in the paired programming is to create an environment for students who are skilled to learn how to teach programming as well and work will collaborators who may not be as confident with R.

A large block of time in this lab should be dedicated to marginal structural models as this concept generates many questions later in the course during final projects. Most students have many time points and need to summarize results into some easily interpretable measure that also has more power by borrowing information across the treatment regimes.

Overall, this lab will continue to focus on intervening on the data-generating systems but now these systems are based on a different example, and we are using paired programming from random group assignment. It is more difficult to answer students' questions conceptually in this lab because most questions are about programming problems. The GSI could ask the whole class if others have had X problem and how they solved it, but I would use this method with discretion as it distracts all the other groups. It is better for the GSI to work with the pair with the question but instead of answering directly, to ask, "What would you Google to answer this question", or "What is it you are trying to do and where do you think the problem is?".

Pedagogy principles: Team-based collaborative programing. Defronting the classroom. Consolidating from the bottom - if our goal is to have students fluently program these estimators then the class works at a pace balanced by those who are less advanced in programming: When they are the drivers, the navigator will help them. Of course this slows things down but will hopefully push the students who are less skilled in R to practice outside of class so that the dynamic they have feels more productive when working with a team member.

Goal: Teach students who are more skilled in programming how to help others. Facilitate more empathy and help when programming together rather than default into the silent individual programming that invariably happens in these labs that are programming-heavy. Help students collaborate in solving programming problems related to the statistical questions of interest.

To-do: The lab will need to be rewritten to use the more socially inspired real-world example but most of the programming/functions will remain the same. Will need to create an R script for random paired programming assigner with navigator/driver. Could also make a timer such that when it goes off the navigator/driver in each group switches.

Lab 3

Lab Restructuring This lab focuses on understanding time dependent confounding and identifiability in longitudinal context. This is presented such that standard randomization and positivity assumptions used in the point treatment setting do not hold so we set up a number of straw man situations to show that there is no set of confounders that we can adjust for simultaneously that block the backdoor path and are not descendants of our treatment nodes. Therefore, it is important for students to have a firm understanding of the randomization assumption in the point treatment setting. Originally, examples were given by the lecturer of the class, but I believe these examples should be moved to the lab (with additional examples) such that students can engage with the content more actively. Students are normally given directed acyclic graphs and asked if there are any set of variables that can be adjusted for that isolate the causal path of interest. I think that students would engage more if they were given situations where they had to draw from a description. For example, "Draw a directed acyclic graph with three endogenous nodes and two exogenous nodes where the null set is true (no adjustment necessary)". This type of question forces students to think more deeply about these causal graphs rather than simply saying yes or no to the question, "can we isolate the causal path of interest?".

This type of teaching will take more time so this lab should be done in two different sessions. It's particularly important to slow down in this section as the beginning of the lab was introduction to non-parametric statistics in the longitudinal setting and the associated R coding but now, we are getting into new statistical principles related to the sequential randomization and positivity assumptions which students have not seen in the Causal 1 class. Because of the new concepts introduced in this lab, the first hour of lecture can be used to introduce the sequential randomization/positivity assumptions in the longitudinal setting for a simple example. Students are then familiarized with how this is different to the point treatment setting.

In the second hour, students are randomized to groups of 4 and each group works on the board to answer questions like "Write out the randomization assumption for a study with four treatments, four time varying covariates, baseline covariates, and a final outcome measured at the end". This flips the material so, instead of just showing the students the equations, they are instead given a situation which they need to translate into mathematical terms. The same can be done for the target causal parameters.

This lab would need to be rewritten to change the way this material is presented with the new causal question in mind related to the new dataset presented in Lab 1. Generally, the goal here is to make students think more about what variables are being intervened on and how this changes the sequential randomization equations. Because of flipping the way this is taught, it may be more helpful in getting students to have more intuition around these initially somewhat convoluted equations that are hard to break down.

Pedagogy principles: Get students engaged with new statistical equations by getting them up on the white board in groups. This helps students take a simple example for a target estimate and expand on it in a lighter, more dynamic setting (erasable white board) rather than trying to write it in Latex which is more laborious. The idea is to try and get students to feel like they can play with notation in the same way that they program - through a bit of trial and error. As of now, these new concepts are basically presented to the students which leads to a passive acceptance. This leads to consolidation from the bottom because students can work together to draw out the data structures and answer these questions, so it ensures that all students are collaborating with the new information.

Goal: Engage students with new statistical concepts and equations. Make students feel more confident with this material.

To-do: The lab will need to be rewritten to use the more socially inspired real-world example. This should be done to keep the various statistical parameter and structures the same. That is, having more important examples for real-world situations but keeping the number of treatments/time-varying covariates/outcomes the same.

Visual Aids: For this lab, I think that students should use the white board to draw data structures similar to these examples DAG examples

Lab 4

Lab Restructuring: At this point in the curriculum, students should be comfortable with the background material needed in previous steps of the roadmap before estimation. This means that, to really transfer learning to the individual level, students need to begin taking on more responsibility for their own learning. In the previous labs, there was not much work needed by students to be done before the lab, besides Lab 0. Now, however, we should begin priming students to come in with some background as to what will be covered.

The current lab is the inverse probability of treatment weighted estimator, how it is equivalent to the g-computation estimator, and the various performance measures for estimators. Again, rather than just presenting students with the equations in the PDF and having them program, we should begin by handing students a lab principles worksheet, randomly splitting into groups, and have students work on the white board to answer questions such as "Write out the equation for MSE based on bias and variance of an estimator" and "Write out the IPTW estimator for two treatment time points under no censoring" and "Draw a histogram of propensity weights that are indicative of positivity violations for the IPTW estimator". This strategy at the beginning of each class - the lab principles worksheet that is done by groups on the whiteboard - is particularly important for the consolidation from the bottom which must start with the presentation of solutions from all students. In this way, before entering coding of the estimators, everyone is starting from the same place. The strategy of teaching the whole class then becomes a scenario where the GSI leads a detailed discussion of the tasks and solutions using student work on the whiteboard to work through the different layers of the solution. Then, with those solutions, the class can work on applying the estimators to the real-world data simulation and assessing the bias/variance/MSE of the estimator under resampling.

Pedagogy principles: Active engagement by using the white board in randomly assigned groups. Putting responsibility on the students for their own learning a little more. Consolidation from the bottom. Because of the structure of the class, what students wrote on the white board now becomes their own meaningful notes for the subsequent section of implementing/programming the estimators. Students are used to being told exactly what to write down in their notes, or in our case, exactly what to write in the solutions/programs. Restructuring the classroom to start with group work of the lab principles worksheet allows students to simultaneously think actively on solutions which then become the meaningful notes for the next sections when they are programming.

Goal: Consolidation from the bottom, meaningful notes, managing flow by creating worksheets that are incrementally harder.

To-do: Lab principles worksheets will need to be made. All the equations and programming can be retained in the current lab 4 but the general problem to be answer will need to be reflected in the new real-world example.

Lab 5

Lab Restructuring: This lab focuses on using the LTMLE package for longitudinal target learning. At this point, students are expected to know the programming necessary to create estimators in the longitudinal setting and, therefore, this lab can be mostly self-directed by the students. I propose this lab to be another paired programming lab. Students are randomly assigned pairs of a navigator and driver and they follow through the lab to implement LTMLE and to do non-parametric bootstrap of the estimator. Because this lab is heavily programming there aren't many other ways to change the teaching style. Historically, this lab was taught by simply scrolling through the PDF to show students the various equations in estimating variance etc. and to show the parameters that are passed to LTMLE for estimation, as well as some skeleton code on how to do the non-parametric bootstrap. This lends itself naturally to passivity and disengagement of the material until the student just goes back to watch the lab and finishes it individually which doesn't facilitate collaborative learning. The paired programming is an attempt to solve this issue.

Pedagogy principles: Random assignment of pairs. Collaborative learning through paired programming.

Goal: Teach students how to use LTMLE in a more engaged and active way rather than passively listening to the description of it.

To-do: Once the data-generating scripts are updated to represent a real-world example most of this lab will remain the same.

Lab 6

Lab Restructuring: This lab focuses on estimation using the longitudinal g-comp formula, the iterative conditional expectations (ICE) representation of the g-comp formula, and the iterative conditional expectations TMLE. It is difficult to give an intuition into these estimators because it's really just math. However, certain building blocks can help. For example, lectures by Ben Lambert can be very helpful for students to understand the law of iterative expectations. It would be useful for students to spend about an hour reviewing these concepts before coming into class.

Some lecture materials should be developed to help students connect the longitudinal g-comp formula to the programming of conditional densities. Historically, it feels students have difficulty tracing the programming and application to the math. Likewise, students have a hard time in this lecture understanding how the longitudinal g-comp formula can represented as iterative expectations - which is where I think the Ben Lambert lectures can help. This will also help with the LTMLE approach as it's basically just a TMLE update at each step going backwards through the conditional expectations. Generally, I would say the material in the beginning of this lab is too advanced for any group work and supplementary material should be developed to help students get better insight as to what is happening. Once exposition of the problem is given in the first half hour, the next half of section should be directed towards paired programming of each of the estimators. Again, a transparent random assignment of pairs of navigators and drivers and after X time they switch roles in the pair.

The typical approach to this lab is to go over the estimator math very lightly and then show how to program each of the estimators, break into groups for about 15 minutes to allow for group work, then return together, go over the answers, rinse and repeat. This does give some time for students to work but it's not enough for thorough understanding. Therefore, I suggest that the beginning of the lab be restructured to give the students more insight into the notation - like, for the g-comp formula, why is the conditional expectation of Y marginalized over the product of conditional densities of non-intervention nodes? I'd start with these very hard questions and try and find ways to explain things to students in words rather than just saying, "Here is the formula for longitudinal g-comp” and move-on.

Pedagogy principles: Collaborative learning through paired programming.

Goal: Engage students more with the mathematical notation of the estimators and how it connects to what they subsequently program. Make the programming of these more difficult estimators feel more collaborative through paired programming.

To-do: Most of the existing lab can be used but the beginning should be rewritten with some longer form descriptions of the math to try and give more intuition, where possible.

Lab 7

Lab Restructuring: This is the final lab of the semester and goes over again longitudinal estimation of binary exposures using LTMLE. The current lab just loads in the data structures for simulated data related to amount of sleep and test scores/probability of being sick. So far, I have proposed using simulated data that is based on a real-world example from some open access data source that is related to health/resource inequity to simultaneously teach these estimators. Tthis is a good place for the course to align more with an anti-racism approach and connect with the inherently unjust system that we live in and the impacts of racism in public health. Given the scope of the class, a data example based on real world scenarios is really the only place this topic to be slotted in. As such, rather than just using LTMLE again on the same data-generating systems, we should use LTMLE to get estimates for our target parameters of interest in both the simulated data we made in Lab 1 and in the actual data that the simulated data was based off. This way students are more energized to see how estimates from our simulated data compare to the actual data and to see if we answer a real longitudinal question based on data that seems meaningful - rather than the accessible yet slightly boring examples used to date. I think this material will help keep the class engaged, especially since it's the last lab. This lab will also work well with paired programming as it is focused on application of the LTMLE package on data. Once students are finished their estimates of the simulated and real data, they can put their answers on the board. Once all the groups are finished, they can compare answers, check code, and discuss what the real-world estimates mean and what could cause differences between estimates for the simulated and real data.

Pedagogy principles: Ends the class on a real-world example that shows how understanding non-parametric estimators in longitudinal data can be used to advocate for anti-racism. Paired programming aids in collaborative learning.

Goal: This last lab now mirrors exactly what students are expected to do for their final projects. Changing the data material to a real-world example will help keep students engaged - rather than tuning out because of repetitiveness. Paired programming and comparing answers help students work more interdependently.

To-do: Will need to rewrite sections of this lab to include the real data example and load data from the new simulations.