Using Inverse Probability Weighting with SVMs to Address Confounding

by Kristin A. Linn

June 25, 2015

Here we provide an example of how to implement inverse probability weighting with SVMs to address confounding. The basic setup is that we have feature, class label pairs of the form (Equation not rendered) for each subject, i=1,...,n, where Equation not rendered and Equation not rendered for all Equation not rendered. We wish to train a SVM to predict y given x. As an example, y might be an indicator of disease/control group and x might be a vectorized image containing voxel values or volumes of regions across the brain. However, the additional feature vector Equation not rendered observed for all subjects confounds the relationship between x and y. For example, a might contain covariates such as age and sex. In the presence of confounding by a, inverse probability weighting is used to recover an estimate of the target classifier, which is the SVM classifier that would have been estimated had there been no confounding by a.

set.seed (1)

We use the package 'rPython' to access libSVM (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) through scikit learn (http://scikit-learn.org/stable/). The file fit_svm.py contains a python function that implements a linear kernel SVM with subject-level weights and a grid search to tune the cost parameter, C.

print("I am R Code")
## [1] "I am R Code"

Generate data for example

We generate data such that the confounders, a1 and a2, affect both the features, x1 and x2, as well as the class labels, d.

N.B. If some of the estimated weights are extremely large, one may consider truncating the predicted probabilities (e.g., at the 5th percentile) or using stabilized weights. Define Equation not rendered and Equation not rendered as well as corresponding estimates \hat{S}_i = \hat{pr}(d_i=1 | a1_i, a2_i) and \hat{M}_i = \hat{pr}(d_i=1), Then, stabilized weights and their corresponding estimates are defined, respectively, as: Equation not rendered and fun Equation not rendered

Train the inverse probability weighted SVM (IPW-SVM)