A cell-to-patient machine learning transfer approach uncovers novel basal-like breast cancer prognostic markers amongst alternative splice variants
All data to reproduce figures can be accessed here :
How to use
Two python(3) scripts are given separately for splicing and expression.
In directory data, you will find the input files.
They are based on the following version of scikit-learn (0.21.2.)
NB: Imputer warnings when script start is not an error.
They call one R script to plot survival over the rounds of classification.
python classification_cell2patient_splicing.py \
-c {absolutepath}/MatriceExonPSI_CellLines.csv \
-p {absolutepath}/MatriceExonPSI_Patients.csv \
-t 0.6 \
-n 1000 \
python classification_cell2patient_expression.py \
-c {absolutepath}/MatriceGeneTPM_CellLines.csv \
-p {absolutepath}/MatriceGeneTPM_Patients.csv \
-t 0.6 \
-n 1000
- t : Threshold for class probabilities.
- c : Path to a matrice with Expression/Splicing values for Cell Lines.
- p : Path to a matrice with Expression/Splicing values for Patients.
- n : Number of tree in the forest.
The final file annotated is splicing_TCGA_BASAL_HEADER_ADDED.tsv.
You can visualize using https://software.broadinstitute.org/morpheus.
The best features of interest are in outputBorutaPy.txt/.bed.