/Villemin_2020

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

cell2patient Logo

A cell-to-patient machine learning transfer approach uncovers novel basal-like breast cancer prognostic markers amongst alternative splice variants


All data to reproduce figures can be accessed here : DOI

How to use

Two python(3) scripts are given separately for splicing and expression.

In directory data, you will find the input files.

They are based on the following version of scikit-learn (0.21.2.)

NB: Imputer warnings when script start is not an error.

They call one R script to plot survival over the rounds of classification.

python  classification_cell2patient_splicing.py \
	 -c {absolutepath}/MatriceExonPSI_CellLines.csv \
	 -p {absolutepath}/MatriceExonPSI_Patients.csv \
	 -t 0.6 \
	 -n 1000 \ 
python  classification_cell2patient_expression.py \
	 -c {absolutepath}/MatriceGeneTPM_CellLines.csv \
	 -p {absolutepath}/MatriceGeneTPM_Patients.csv \
	 -t 0.6 \
	 -n 1000 
  • t : Threshold for class probabilities.
  • c : Path to a matrice with Expression/Splicing values for Cell Lines.
  • p : Path to a matrice with Expression/Splicing values for Patients.
  • n : Number of tree in the forest.

The final file annotated is splicing_TCGA_BASAL_HEADER_ADDED.tsv.
You can visualize using https://software.broadinstitute.org/morpheus.

The best features of interest are in outputBorutaPy.txt/.bed.