/breastcancersurvsign

This repository contains all the data and resources to reproduce the experiments from the paper https://doi.org/10.1093/bioadv/vbad037

GNU Affero General Public License v3.0AGPL-3.0

Identification of a gene expression signature associated with Breast Cancer survival and risk that improves clinical genomic platforms

Santiago Bueno-Fortes, Alberto Berral-Gonzalez, José Manuel Sánchez-Santos, Manuel Martin-Merino, Javier De Las Rivas

Bioinformatics Advances, vbad037, https://doi.org/10.1093/bioadv/vbad037

Published: 22 March 2023

Motivation

Modern genomic technologies allow us to perform genome-wide analysis to find gene markers associated with the risk and survival in cancer patients. Accurate risk prediction and patient stratification based on robust gene signatures is a key path forward in personalized treatment and precision medicine. Several authors have proposed the identification of gene signatures to assign risk in patients with breast cancer (BRCA), and some of these signatures have been implemented within commercial platforms in the clinic, such as Oncotype and Prosigna. However, these platforms are black boxes in which the influence of selected genes as survival markers is unclear and where the risk scores provided can not be clearly related to the standard clinico-pathological tumor markers obtained by immunohistochemistry (IHC), which guide clinical and therapeutic decisions in breast cancer.

Results

Here we present a framework to discover a robust list of gene expression markers associated with survival that can be biologically interpreted in terms of the 3 main biomolecular factors (IHC clinical markers: ER, PR and HER2) that define clinical outcome in BRCA. To test and ensure the reproducibility of the results, we compiled and analyzed two independent datasets with a large number of tumor samples (1,024 and 879) that include full genome-wide expression profiles and survival data. Using these two cohorts, we obtained a robust subset of gene survival markers that correlate well with the major IHC clinical markers used in breast cancer. The geneset of survival markers that we identify (which includes 34 genes) significantly improves the risk prediction provided by the genesets included in the commercial platforms: Oncotype (16 genes) and Prosigna (50 genes, i.e. PAM50). Furthermore, some of the genes identified have recently been proposed in the literature as new prognostic markers and may deserve more attention in current clinical trials to improve breast cancer risk prediction.

Availability

All data integrated and analyzed in this work are available at this GitHub. The data matrices used in this article are avilable in the releases section. The tables containing the results from the analysis are in the results folder. All R scripts with the functions and protocols used for the analysis are part of a library for R that is under development and will be made available as soon as possible.

References

Schwender H (2022). siggenes: Multiple Testing using SAM and Efron's Empirical Bayes Approaches. R package version 1.72.0.)

Schroeder MS, Culhane AC, Quackenbush J, Haibe-Kains B. survcomp: an R/Bioconductor package for performance assessment and comparison of survival models Bioinformatics 27(22): 3206-3208. (2011).

Haibe-Kains B, Desmedt C, Sotiriou C, Bontempi G. A comparative study of survival models for breast cancer prognostication on microarray data: does a single gene beat them all? Bioinformatics 24(19): 2200-2208. (2008).

Supplementary information

Supplementary material is available at Bioinformatics Advances online.