/survival-geneset-ranking

Accompanying code to paper: Large Scale Gene Set Ranking for Survival-Related Gene Sets

Primary LanguageJupyter Notebook

Large Scale Gene Set Ranking for Survival-Related Gene Sets

DOI

This is code to reproduce key results and figures from the article: Large Scale Gene Set Ranking for Survival-Related Gene Sets.

logrank-rmst-comparison

Instalation

The code was tested on Ubuntu 20.04.4 LTS and MacOS 13.1. With python versions 3.9.15 and 3.10.8.

Follow these steps to prepare the environment:

  • Clone the repository
git clone https://github.com/MartinSpendl/survival-geneset-ranking
cd survival-geneset-ranking
  • Install the required packages
# using pip in a virtual environment
pip install -r requirements.txt

# using Conda
conda create --name <env_name> --file requirements.txt
conda activate <env_name>

Data

Data used for the analysis is publically accessible.

TCGA RNAseq expressions

Download repository and run all 4 notebooks: https://github.com/JakaKokosar/tcga-data.git

Genesets

Download genesets from the MSigDB by signing in. Save them to the data/genesets directory.

We used:

  • Hallmark genesets: h.all.v2023.1.Hs.symbols.gmt
  • GO-Biological Process: c5.go.bp.v2023.1.Hs.symbols.gmt

Reproduce results

Firstly, run both scipts in the /scipts directory:

python calculate_ssGSEA.py

python calculate_statistical_test.py

Secondly, run notebooks in the /notebooks directory.

Figures from the notebooks are stored in the /figures directory.

Highlights

Cluters of survival-related genesets from GO gene sets on the TCGA-KIRC dataset.

GO-clustering