AML_datasets.RData
- initial dataset as provided by papergene.RData
- created when running thegenesignature.ipynb
notebookdata.RData
- created when running thesplitdata.ipynb
notebook
- Ensure
AML_datasets.RData
is in project directory.
The datasets created are too big to be committed to the repository, and thus the feature selection and data split is left to be created by the user.
- Run
genesignature.ipynb
notebook to generategene.RData.
This creates the list of gene signatures to train the models using the feature selection method proposed in the paper - Run
splitdata.ipynb
notebook to generatedata.RData
. This creates the train/test splits for the models to use for training. - Run
models.ipynb
. This trains the models first using the gene signatures selected via the proposed method in the paper, then by using the alternative gene signatures as proposed by group Genome Seeker, and finally cross validation is performed using the first set of gene signatures