- Download all code and data files into the same directory. The data are in http://acgt.cs.tau.ac.il/genepark_data/.
- Open reproduce_geneparks_results.R
- Set the working dir to the directory in which all files are in
- Run the analyses to get the models, signature, and figures. Note that some analyses are slow (especially the fSVA-based).
These are all the compressed .7z files. Here are important comments on these files:
- The training set files should be merged - they were split because of the file size (>25mb). These are the training set samples after preprocessing.
- The training and validation combined data files should be merged - they were split because of the file size (>25mb). These are the training and validation combined set after preprocessing. These data were used for the training set and for obtaining the biomarker.
- Validation set - a set of samples that were used for initial tests and for tuning.
- The final test set
Original code is distributed under the BSD 3 clause license.