We present the DeepProfile framework, whichlearns a variational autoencoder (VAE) networkfrom thousands of publicly available gene expres-sion samples and uses this network to encode alow-dimensional representation (LDR) to predictcomplex disease phenotypes. To our knowledge,DeepProfile is the first attempt to use deep learn-ing to extract a feature representation from a vastquantity of unlabeled (i.e, lacking phenotype in-formation) expression samples that are not incor-porated into the prediction problem. We use Deep-Profile to predict acute myeloid leukemia patients’in vitroresponses to 160 chemotherapy drugs. Weshow that, when compared to the original features (i.e., expression levels) and LDRs from two com-monly used dimensionality reduction methods,DeepProfile: (1) better predicts complex pheno-types, (2) better captures known functional genegroups, and (3) better reconstructs the input data.We show that DeepProfile is generalizable to otherdiseases and phenotypes by using it to predictovarian cancer patients’ tumor invasion patternsand breast cancer patients’ disease subtypes.
Short paper is available: https://www.biorxiv.org/content/early/2018/03/08/278739