/ml4sp_metastasis

detecting metastatic tumors and their origin sites using metastatic tumor data

Primary LanguageJupyter Notebook

ml4sp_metastasis

Downloading the data

Getting GSM samples from http://hcmdb.i-sanger.com/.

Use GEOparse to download data into Python. (this will also download a textfile of the sample in your root directory)

Getting mRNA data

We look at all samples with the same ID_REF row names, and add those values to our data frame.

Dimensionality reduction using PCA

Task 1: Identifying normal, metastatic tumor, and primary tumors

  • [] random forest
  • [] logistic regression
  • [] gaussian

Task 2: Identify primary site from metastatic tumor