Kaggle challenge on DNA sequence classification for the MVA Msc Kernel Methods for Machine Learning Course.
Classes extending from kernels.Kernel
.
apply(self, embed1, embed2)
given two embedding vectors, returnsK(embed1, embed2)
. By default, use a liner kernel (i.e. inner product).embed(self, sequences)
takes a list of string sequence and returns a list of embedded vectors.
Classes extending from classifiers.Classifier
.
-
fit(self, X, Y)
and computes and setsself.alpha
which represents the parameters of the classifier so that$f(x) = \sum_i \alpha_i K(x_i, x)$ . Must also setself.training_data = X
at the beginning for thepredict
method.
The file config/default.yaml
is loaded, then all other files in the config
folder are loaded
in alphabetical order, overriding (or adding) new values.
defautl.yaml
is an example file and is the only versionned file.
To start, create a new file with personal values.
Config of the different kernels.
kernel
can be among :
Simple onehot where one letter is represented as a 4-dim onehot vector.
No args.
Spectrum of the sequence, inspired from slide 55 of http://members.cbio.mines-paristech.fr/~jvert/talks/060727mlss/mlss.pdf.
Args :
- length : Length of the words to do the spectrum on.
Config of the classifiers Among :
Args :
- C : C-svm.
Args:
- lambda: regularization constant.