program-evaluation-with-matching

Project Overview: Program evaluation without random assignment (with matching)

Walk through of a hypothetical program evaluation, where we evaluate training program's effect on performance, where participation was not randomized and matching is used to control for differences between participants and non-participants.

  • Utilized a kaggle sample dataset for attrition analysis from ibm to mirror real employee data relationships
  • Simulated participation in the program
  • Explored differences between participant group and non-participant population
  • Attempted and evaluated different types of matching methods and specifications
  • Obtained matches using our best matching specs
  • Esimated marginal effect of particpation on performance
  • Brief explanation on limitations of matching

Sources

kaggle ibm attrition dataset
getting started with matchit
matching methods

Data Simulation

IBM attrition dataset has a feature called 'TrainingTimesLastYear'. To keep any inherent relatinoships in the data I used this feature to create a training participation flag. If you attented a training more than 4 times last year ( use that as a signal that this person is likely to opt into a training program if they were invited. To similuate the invitation, I decided this particular training was of most benefit to early career R&D employees and restricted the flag to turn on only for these groups (Job level 1-3, dept = R&D).

Exploring Group Differences

Participants had slightly lower performance ratings on average relative to non-participants (3.12 vs 3.16), but being lower levels and R&D it is possible these groups had lower ratings to begin with and performance actually increased due to participation. We will attempt to tease out the true relationships using matching.

As expected with a training targeting early career employees, participants were younger and had less experience than participants.

unnamed-chunk-5-1

R&D employeees indexed higher on education than other employees and our lower job levels is born out in the data.

unnamed-chunk-6-1

Participants were also more likely to be female than non-particpants, perhaps related to R&D roles relative to other departments, but also potentially could be a difference in the likelihood of opting in for more training.

unnamed-chunk-7-1

Matching Types

Exact Matching:

with our set of variables only 5 parcipant observations got matches.

Coarsened Exact Matching (CEM):

a defautl model improved to 34 of 185 participants getting matches. With some extra specification in binning of the continuous variables we improve to 177 of 185 participants getting matches and those matches do a very good job at minimizing the standardized mean difference across variables.

unnamed-chunk-14-1

Propensity Score Matching (PSM):

A default model gets matches for every participant, but the quality of matches is slightly worst than our CEM model with cutom binning.

unnamed-chunk-16-1

Exact + PSM:

A model using exact binning to create strata and PSM to assign matches within strata retains all participant observations and slighly outperforms our CEM model. By altering the PSM matching method and the link function for the propensity model we improve the match qualitiy slightly again.

unnamed-chunk-18-1

Our final matching specification does well at minimizing the distance between participant and non-participant lines/distributions along our variables, not surprisingly, especially along the variables we specified to match exactly (gender, education, job level, department).

unnamed-chunk-19-2 unnamed-chunk-19-3 unnamed-chunk-19-4

Estimate Marginal Effects

I use a linear regression to add any remaining robustness needed in controlling for differences in covariates, ultimately looking for the marginal effect of participation in training on mean performance rating, which I estimate to be a decrease in performance rating of .0402, which, at a pvalue of .253, means we don’t have much confidence that the true effect of training participation on performance rating is not actually zero.

Said differently, this test cannot reject our default assumption that participating in the training doesn’t effect performance ratings.

Limitations

The possibility of an unobserved confounding variable still remains a threat to causal claims even after matching well on all the things we can measure, so causal claims obtained by matching should be made more cautiously than ones made with random assignment. That being said, where random assignment is not possible, practical, ethical, or legal, matching is a great alternative for providing a level of confidence about causation in the evaluation of the effectiveness of your programs.