/cancer-clust

Classifying RNA-Seq gene expression data by tumor type using unsupervised machine learning techniques.

Primary LanguageJupyter Notebook

Tumor Classification: RNA-Seq Data Set

This dataset is from the UCI Machine Learning repository. This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD and PRAD. There are 801 instances with 20531 attributes. The data can be downloaded here: https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq

The goal of this project is to use unsupervised machine learning techniques to stratify RNA-Seq gene expression data by tumor type.