A Python package and string kernel algorithm for training SVM classifiers for sequence analysis. Built with the iGakco gapped k-mer algorithm, pybind11, and LIBSVM.
On Unix (Linux, OS X)
- A compiler with C++11 support
- CMake >= 2.8.12
On Windows
- Visual Studio 2015 (required for all Python versions, see notes below)
- CMake >= 3.1
With pip
pip install -i https://test.pypi.org/simple/ igakco-test
With Conda Currently, we only support PyPi builds. If you use Anaconda, we recommend installing pip:
conda install pip
And then installing igakco with the above pip
command.
From source
Clone this repository:
git clone --recursive https://github.com/dblakely/pygakco.git
The --recursive
flag is to make sure the Pybind11 library is cloned as well. Then run:
pip install ./igakco
or
pip3 install ./igakco
Example usage:
from igakco import SVM
svm = SVM(g=7, m=2, C=0.7)
svm.fit(train_file="1.1.train.fasta", test_file="1.1.test.fasta", quiet=False, kernel_file="output.txt")
svm.predict("predictions.txt")
This will use the provided parameters to build train and test kernel matrices and train an SVM classifier. The predict
call will write the predicted labels of the provided test_file
to predictions.txt
.
Alternatively, we can train by feeding in arrays of sequences and labels:
from igakco import SVM
xtrain = ["ACACA", "AAACA"]
ytrain = [1, 0]
xtest = ["AAAAA", "ACACA"]
ytest = [1, 0]
svm = SVM(g=3, m=2, C=0.7)
svm.fit_from_arrays(xtrain, ytrain, xtest, ytest, "kernel.txt")
svm.predict("preds.txt")
Constructor:
g
(required)m
(required)C
(optional, default=1.0)nu
(optional, default=0.5)eps
(optional, default = 0.001) - LIBSVM epsilon parameterkernel
(optional, default = 'linear'). Options: linear, gakco, rbf
Fit:
train_file
(required)test_file
(required)dict
(optional). A dictionary file for the sequences in the train and test files. Default behavior is to infer the dictionary from the files.quiet
(optional, default=false). Whether to be verbose.kernel_file
(optional). If provided, the kernel matrix will be printed to the provided file. Otherwise, kernel matrix will not be saved.
Predict:
predictions_file
(required). File where predictions will be written. Format is one prediction, a single number, per line.
Compiler requirements
Pybind11 requires a C++11 compliant compiler, i.e Visual Studio 2015 on Windows. This applies to all Python versions, including 2.7. Unlike regular C extension modules, it's perfectly fine to compile a pybind11 module with a VS version newer than the target Python's VS version.
See the LICENSE file.