Correcting Covariance Batch Effects (CovBat): Harmonization of mean and covariance for multi-site data
Maintainer: Andrew Chen, andrewac@pennmedicine.upenn.edu
License: Artistic License 2.0
The R package can be installed via devtools by running the following code
# install.packages("devtools")
devtools::install_github("andy1764/CovBat_Harmonization/R")
Then, you can load this package via
library(CovBat)
The R package provides the covbat
function for harmonization of covariance and the combat
function which is adapted from an older version of the original ComBat package. For applying ComBat, please visit the main ComBat repository.
For Python, please visit the Python subdirectory.
Current harmonization methods often focus on addressing scanner differences in the mean and variance of features. However, machine learning methods employed in multivariate pattern analysis (MVPA) are known to leverage additional properties of the data, including covariance. In our recent paper, we show that ComBat, a state-of-the-art method designed to harmonize mean and variance, is unable to fully prevent detection of scanner manufacturer through MVPA in the Alzheimer's Disease Neuroimaging Initiative data. We design CovBat to harmonize the covariance of multivariate features and show that it can almost fully prevent detection of scanner properties.
CovBat is meant to be applied after initial preprocessing of the images to obtain a set of features and before statistical analyses. The application of CovBat is not limited to neuroimaging data; however, it has yet to be tested in other types of data.
The R implementation of CovBat is based on the ComBat package maintained by Jean-Philippe Fortin. The Python implementation of CovBat is a modification of the ComBat package for Python here.
If you are using CovBat for harmonization of mean and covariance, please cite the following article:
Chen, A. A., Beer, J. C., Tustison, N. J., Cook, P. A., Shinohara, R. T., Shou, H., & Initiative, T. A. D. N. (2022). Mitigating site effects in covariance for machine learning in neuroimaging data. Human Brain Mapping, 43(4), 1179–1195. https://doi.org/10.1002/hbm.25688)