Machine learning on sequences

This is a small genome-gazing project involving taking someone else's data and trying to come up with some lab-testable hypotheses.

This will use the attached dataset and code (we can discuss whether this is appropriate to put onto github). It will primarily use the packages seqinr and randomForest, but also Boruta, plus GGally, ggplot2 and reshape2 for visualisation.

Credits go to: Chris Knight