/TCGA_Spark

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

TCGA GTEx Spark

Environment
python: version
spark: version

dataset
dataset: gene expression RNAseq - RSEM norm_count download
dataset: gene expression RNAseq - RSEM tpm download
dataset: phenotype - TCGA TARGET GTEX selected phenotypes download
dataset: phenotype - TCGA survival data download

ToDO1: spark setting -> docker-compose up -d
ToDO2: select GTEx breast -> code/01 select GTEx breast.ipynb
ToDO3: compute pearson correlation coefficient (PCC) in GTEx breast (Gene1, Gene2)
ToDO4: compute a p-value of PCC in GTEx breast (Gene1, Gene2)
ToDO5: to cancer and beyond....
ToDO6: ....