In this project, we are proposing a classification engine using somatic single-nucleotide polymorphisms (SNPs) to predict patients who are significantly at risk of developing HGPCa.
For a detailed understanding of the study, see the writeup attached to this repo.
data_directory='/data'
results_directory='/results'
sh process_data.sh $data_directory $results_directory
sh train_model.sh $data_directory $results_directory
sh test_model.sh $data_directory $results_directory
- Setup data pipeline
- Parse gleason labels
- Tie gleason labels to entity IDs in .maf file
- Parse SNP data
- Tie SNP data to gleason labels
- Implement cross validation
- Use Sci-kit for 3 models and evaluate on cross-validation data
- Evaluate performance of different methods
- Write final paper
- Write powerpoint