The ability to identify novel risk factors for health outcomes is a key strength of electronic health records (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in electronic health records (EHR) data.