/esvm

An erlang library for Support Vector Machine (SVM) classification and regression

Primary LanguageC++BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

esvm

Build Status GitHub Hex.pm

A simple, easy-to-use, and efficient erlang library for Support Vector Machine (SVM) classification and regression based on libsvm. It solves C-SVM classification, nu-SVM classification, one-class-SVM, epsilon-SVM regression, and nu-SVM regression.

The Support Vector Machine (SVM) is a widely recognized technique used for classifying large feature spaces reliably. It is a statistical model that employs machine learning approaches to capture complex relationships between variables. The primary principle underlying the Support Vector Machine revolves around distinguishing information between different classes by identifying an optimal hyperplane. This hyperplane is chosen to have the maximum margin or distance to the nearest training data points of any class, ensuring superior generalization capabilities.

This method offers several advantages. It exhibits remarkable performance in high-dimensional spaces and effectively manages memory usage by utilizing a subset of training points in its decision function. However, it may not be as efficient when the number of features exceeds the number of samples.

Quick start

Compile:

rebar3 compile

Create a model

% features should be a list with tuples where first element is the item class and the second one the feature vector.

Features = [
    {1, [1 ,3, 4, 5]},
    {0, [0 ,2, 4, 6]},
    {1, [0 ,2, 4, 6]}
],

FeaturesCount = length(Features),

{ok, Model} = esvm:model_create(Features, FeaturesCount, [
    {<<"svm_type">>, ?SVM_TYPE_C_SVC},
    {<<"kernel_type">>, ?KERNEL_TYPE_RBF}
]).

Available parameters you can tune when creating a model:

  • svm_type : One of SVM_TYPE_* from esvm.hrl. Default to SVM_TYPE_C_SVC.
  • kernel_type: One of KERNEL_TYPE_* from esvm.hrl. Default to KERNEL_TYPE_RBF.
  • degree: Set degree in kernel function (default 3).
  • gamma: Set gamma in kernel function (default to: 1/max feature length).
  • coef0: Set coef0 in kernel function (default 0).
  • cache_size: Set cache memory size in MB (default 100).
  • eps: Set tolerance of termination criterion (default 0.001).
  • C: set the parameter C (cost) of C-SVC, epsilon-SVR, and nu-SVR (default 1).
  • nu: set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5).
  • p: set the epsilon in loss function of epsilon-SVR (default 0.1).
  • shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1).
  • probability: whether to train a model for probability estimates, 0 or 1 (default 0).

Save a model

true = esvm:model_save(Model, <<"path/file.model">>).

Load an existing model

{ok, Model} = esvm:model_load(<<"path/file.model">>).

Prediction

{ok, PredictedClass} = esvm:model_predict(Model, Feature).

Tests

Inside classification_test.erl from test folder you can find an example on how you can create a model that classify if a sms message is spam or not using SVM.

The data source used to train the model can be downloaded from here.

In order to run the tests execute rebar3 eunit from project root.