Fertig et al. (2016) CoGAPS matrix factorization algorithm identifies transcriptional changes in AP-2alpha target genes in feedback from therapeutic inhibition of the EGFR network. Oncotarget, 2016.

Patients with oncogene driven tumors are treated with targeted therapeutics including EGFR inhibitors. Genomic data from The Cancer Genome Atlas (TCGA) demonstrates molecular alterations to EGFR, MAPK, and PI3K pathways in previously untreated tumors. Therefore, this study uses bioinformatics algorithms to delineate interactions resulting from EGFR inhibitor use in cancer cells with these genetic alterations. We modify the HaCaT keratinocyte cell line model to simulate cancer cells with constitutive activation of EGFR, HRAS, and PI3K in a controlled genetic background. We then measure gene expression after treating modified HaCaT cells with gefitinib, afatinib, and cetuximab. The CoGAPS algorithm distinguishes a gene expression signature associated with the anticipated silencing of the EGFR network. It also infers a feedback signature with EGFR gene expression itself increasing in cells that are responsive to EGFR inhibitors. This feedback signature has increased expression of several growth factor receptors regulated by the AP-2 family of transcription factors. The gene expression signatures for AP-2alpha are further correlated with sensitivity to cetuximab treatment in HNSCC cell lines and changes in EGFR expression in HNSCC tumors with low CDKN2A gene expression. In addition, the AP-2alpha gene expression signatures are also associated with inhibition of MEK, PI3K, and mTOR pathways in the Library of Integrated Network-Based Cellular Signatures (LINCS) data. These results suggest that AP-2 transcription factors are activated as feedback from EGFR network inhibition and may mediate EGFR inhibitor resistance. Details about the algorithm and these results described in Fertig et al. (2016) Code supporting these results is written for R organized using ProjectTemplate.

Briefly, the file structure is as follows:

  • cache: R objects stored from intermediate analysis, and used to obtain results in the paper.
  • data: Cell survival data. Rdata objects containing raw gene expression data from the CEL files in GEO
  • graphs: plots generated from R code, used as Figures in the manuscript
  • munge: code to preprocess gene expression data to useable format for analysis, subsequently stored in the cache.
  • src: code to obtain published results from data preprocessed according to munge and stored in the cache
  • reports: tables generated from the analysis included in the manuscript and analyses of TCGA/LINCS data