seer-lab/MutationScorePredictor

Strip and split libsvm files for training and testing

kevinjalbert opened this issue · 1 comments

The libsvm files have a set of features for each data item. We want to be able to see the difference in accuracy of prediction based on using only a subset of the features. Thus this issue will add the functionality to remove defined features from the libsvm file.

In addition, we need a way to splitting out a random x% of the data from the libsvm file for training and testing purposes.

Rough implementation of this features is done in commit 8dfb39a. Eventually more automation using the Rakefile is needed.