/PolyARiboDepletion_classifier

write a classifier that can distinguish samples prepared with a ribo-depeletion library preparation method from those prepared using a polyA library preparation method based on expression data

Primary LanguageJupyter Notebook

PolyARiboDClassifier

Undergraduate research by Liam McKay mentored by Holly Beale. Thanks to the whole Treehouse team:

What This Does


Creates data sets for training and testing a Logistic Regression model to classify Ribosomal RNA depletion and Polyadenylation Selection methods from gene expression values in log2(TPM+1)
DESeq analysis on the compendium within subtypes of disease that (future) is to be used to create differentially expressed features

Dependencies

Install Anacaonda for python 3: https://anaconda.org/anaconda/python
(To check what version of python you have type python --version)
Run dependenciesPython and dependenciesR notebooks to see if all dependencies are installed

Quickstart:

  • load dependenciesPython and dependenciesR hopefully without error
  • run datasiftingfinal notebook to gather data in data/ folder in this directory
  • run trainingGliomaAllGenes and trainingGliomaHighVar to see Logistic Regression model testing
  • run DESeqGenes to get genes that are differentially expressed between polyA and riboD within a disease