/PD_02

Data Mining #2

Primary LanguageJupyter Notebook

PD_02

In this project we used Python 3, Jupyter Notebook and Scikit-learn.

The dataset to be analysed is medulloblastoma_genes.csv. It includes 76 samples of medulloblastoma (MB) with respective expression levels of 54.675 genes measured in children with ages between 3 and 16 years. Medulloblastoma is a malignant childhood brain tumour comprising four discrete subgroups.

In this project, the sample labels were included in the labels.csv file, labelled as MB-CL or Other. In this case, we have 51 samples of classic medulloblastoma (MB-CL) and 25 other types (namely: 6 desmoplastic nodular, 17 anaplastic and 2 medullomyoblastoma).

In medulloblastoma_genes.csv each line represents a sample and each column represents a gene.

The goal was to cluster samples and (ideally) find "MB-CL" groups and "Other MB" groups.