In this project we used Python 3, Jupyter Notebook and Scikit-learn.
The dataset to be analysed is medulloblastoma_genes.csv. It includes 76 samples of medulloblastoma (MB) with respective expression levels of 54.675 genes measured in children with ages between 3 and 16 years. Medulloblastoma is a malignant childhood brain tumour comprising four discrete subgroups.
In this project, the sample labels were included in the labels.csv file, labelled as MB-CL or Other. In this case, we have 51 samples of classic medulloblastoma (MB-CL) and 25 other types (namely: 6 desmoplastic nodular, 17 anaplastic and 2 medullomyoblastoma).
In medulloblastoma_genes.csv each line represents a sample and each column represents a gene.
The goal was to cluster samples and (ideally) find "MB-CL" groups and "Other MB" groups.