kmatch

Multivariate-distance and propensity-score matching, including entropy balancing, inverse probability weighting, (coarsened) exact matching, and regression adjustment

kmatch matches treated and untreated observations with respect to covariates and, if outcome variables are provided, estimates treatment effects based on the matched observations, optionally including regression adjustment bias-correction. Multivariate (Mahalanobis) distance matching as well as propensity score matching is supported, either using kernel matching, ridge matching, or nearest-neighbor matching. For kernel and ridge matching, several methods for data-driven bandwidth selection such as cross-validation are offered. In addition, several alternative matching and reweighting methods are supported (coarsened exact matching, inverse probability weighting, entropy balancing). The package also includes various commands for evaluating balancing and common-support violations.

To install kmatch from the SSC Archive, type

. ssc install kmatch, replace

in Stata. Stata version 11 or newer is required. Furthermore, the moremata package is required. To install moremata from the SSC Archive, type

. ssc install moremata, replace

Installation from GitHub:

. net install kmatch, replace from(https://raw.githubusercontent.com/benjann/kmatch/master/)
. net install moremata, replace from(https://raw.githubusercontent.com/benjann/moremata/master/)

Main changes:

12aug2020
- kmatch now uses mm_density() for density estimation and no longer requires
  -kdens- to be installed

05may2020
- kmatch could fail if weights were specified and the variable containing the
  weights was abbreviated; this is fixed
- new (undocumented) option to suppress the list of generated variables in 
  the output

09mar2020
- survey estimation is now supported through options -svy- and -subpop()-
- ifgen() now stores the IFs even if nose is specified
- options that generate variables are no longer allowed with vce(bootstrap) 
  or vce(jackknife)

19jan2020
- parsing of variable list failed if parentheses were used in factor variable
  specifications; this is fixed
- options were allowed within outcome equations, but these options were 
  ignored; error is now returned if options are specified within outcome
  equations
- vce() was only allowed if outcome variables have been specified; vce(analytic)
  and vce(cluster ...) are now also allowed without outcome variables
- the strata variable stored by generate() (_KM_strata) was also filled in
  for observation outside of the estimation sample; this is fixed, i.e. the 
  variable is now set to missing for these observations

30jul2019
- in case of weighted data, balancing weights returned by -kmatch eb- were scaled
  in terms of sample size instead of sum of sampling weights; this did not
  affect treatment effect estimation, but lead to erroneous balancing 
  diagnostics in case of ATE; this is fixed

29may2019
- in case of multiple outcome equations, only the first equation was displayed
  in the output header; this is fixed
- outcome equations in the header are now numbered
- in case of duplicate outcomes only the duplicates were prefixed by a number 
  in the coefficient vector; this has been changed; now all outcomes receive a 
  prefix if there are duplicates
- improved documentation (more examples)

08may2019:
- kmatch now computes approximate standard errors based on influence functions
  (assuming the matching weights to be fixed); corresponding vce() is -analytic-
  (default) or -cluster clustvar-; option -nose- suppresses SE estimation; option
  ifgenerate() stores the influence functions
- option -comsup- without arguments now restricts obs to minimum PS range;
  returns by comsup changed
- option -wor- added (nearest-neighbor matching without replacement)
- option -keepall- added
- coarsened exact matching now supported
- new ebalance option to apply entropy balancing after matching
- new -kmatch eb- command for entropy balancing
- new -kmatch ipw- command for Inverse Probability Matching
- new -kmatch em- command for Exact Matching (just for convenience)
- new -kmatch ra- command for Regression adjustment (just for convenience)
- dy() now supported by all subcommands
- generate() now stores an additional variable containing strata ID (matching 
  subcommands only)
- caliper() is now allowed as a synonym for bwidth()
- new -idgenerate()- option to store IDs of matched controls
- new -bwadjust()- option to adjust bandwidth by specified factor
- new -maxiter()- option to restrict the maximum number of iterations for propensity
  score estimation; maxiter() calls -set maxiter-; the original value is restored
  after running the PS command; default is maxiter() = min(50,c(maxiter))
- outcome variables no longer have to be unique
- comlumn "bandwidth" no longer displayed in matching table in cases where 
  there is no bandwidth; title of column is "Caliper" in case of nn-matching;
  formating of matching table now takes linesize into account; revides alsp some 
  other aspect of results display
- kmatch summarize/csummarize now also support skewness
- kmatch csummarize used iweights for computation of variances/standard 
  deviations; this was appropriate for frequency weighted data but not in other
  cases; this is fixed
- kmatch density and kmatch box could crash in case of negative matching weights
  (which are possible with ridge matching); the commands now treat negative
  weight as zero
- exact matching returned error if there were no matches; this is fixed
- -kmatch md, nn()- could crash under some data constellations if -bwidth()- 
  was specified; this is fixed
  [explanation: this was due to select() returning 0x0 if the input is 1x1 and 
  no elements are selected; subsequent code expected 0x1, as is returned in all
  other cases, i.e. if input is rx1 with r!=1]
- kmatch could crash if the treatment variable had no variance (i.e. if one 
  of the groups was empty); this should now be fixed
- kmatch now returns error if the variable names requested by generate() or dy()
  are not unique
- kmatch md was not running in Stata 11 and 12 because it made use of Mata 
  function selectindex() that was not available prior to Stata 13; this is fixed

22jun2017
- kmatch returned error if no covariates and no ematch() variables were 
  specified; this is fixed
- csummarize used the standard deviation of the matched sample (instead of the 
  standard deviation of the total sample) to compute the standardized 
  differences; this is fixed

13jun2017
- bw(cv over, weighted) had a bug so that wrong weights were used; this is fixed
- matching algorithms now handle ties in the data more efficiently
- matching weights with fweights now the same as in expanded data
- kmatch csummmarize did not account for weights when computing statistics 
  for the unmatched; this is fixed

09jun2017
- kmatch csummarize had a wrong label in the rightmost column of the variance
  table; this is fixed
- kmatch density, kmatch cdensity, and kmatch cbox did not always include labels
  for the variables; this is fixed

08jun2017
- penalty added for large bandwidths in bwidth(cv); suboption -nolimit- 
  deactivates the penalty

07jun2017
- results from -kmatch ps, nn(1)- were not always equal to results from
  -teffects psmatch, nn(1)-; this is fixed
- results from -kmatch md, nn(#)- are not always equal to results from
  -teffects nnmatch, nn(#)-; this has to do with the fact that 
  -teffects nnmatch- treats controls as tied if their distance to the treatment
  observation does not differ by more than -dtolerance()-; setting 
  -dtolerance()- to a very small value, e.g. to -smallestdouble()-, should
  make results from -kmatch md- and -teffects nnmatch- equal
- results from -kmatch md, nn(#)- with bias adjustment are not always equal to
  the results of -teffects nnmatch, nn(#)- with bias adjustment; this is because
  collinear variables in the treatment or control group are handled differently;
  results sould be equal if only non-collinear variables are included in the
  bias adjustment

02jun2017
- PM: now using 90% quantile of nonzero differences

30may2017
- there was a bug with how ties were handled in the cv-outcome algorithm so that
  results were wrong (and unstable); this is fixed  
- changed some of the labeling/naming in output and returns
- changed how CV results are returned
- options noatt and noatc in cvplot are now called notreated and nountreated
- -kmatch md- crashed if there were no covariate; this is fixed
- bandwidth selection is now skipped if there are no covariates
- notes about over category when computing BW displayed counter instead of over
  value; this is fixed

23may2017
- -cvplot, sort range()- did not connect all displayed points; this is fixed

22may2017
- fweights: results from CV are now the same as in the expanded data
- CV with respect to outcome in -kmatch md- did not work with weights; this 
  is fixed

20may2017
- PM algorithm now takes account of weights when computing the minimum distance quantile

19may2017
- MD: added epsilon(h2) to h2 to compensate for possible roundoff error

18may2017
- option -sharedbw- added
- -kmatch md- used the original X instead of the normalized X for 
   no-outcome-CV (unless mdmethod(1) was specified); this is fixed