Efficient training of Support Vector Machines in Java
- Heavily refactored Java port of the venerable LIBSVM (version 2.88).
- Provides idiomatic Java class structure and APIs (unlike the Java version provided by LIBSVM, which is transliterated C code).
- Easy to add new kernels, in addition to the five standard ones provided by LIBSVM.
- On the mathematical side, jlibsvm performs exactly the same computations as LIBSVM, including shrinking and all the fancy stuff described in the LIBSVM implementation docs.
- Optimized kernel implementations run faster, particularly when input vectors are sparse. For instance, on the mushrooms dataset, jlibsvm trained ~25% faster than LIBSVM (java version) with an RBF kernel and ~40% faster with a linear kernel. (The C version of LIBSVM is still faster, though).
- Multithreaded training to take advantage of modern multi-core machines (using Conja).
- Integrated scaling and normalization so you don't have to explicitly preprocess your data.
- Integrated grid search for optimal kernel parameters.
- Drop-in replacement if you use the command-line tools (e.g. svm-train, etc.), but not if you use LIBSVM programmatically.
- Uses Java generics throughout, including for classification labels, so you can specify that the "label" of a class be of whatever Java type you like. In an email-filtering application, for example, you could use objects of type
Mailbox
as the labels. That would allow you to write something likemySvmSolutionModel.predict(incomingEmail).addMessage(incomingEmail)
. Thepredict()
method returns a classification label, which in this case is an object of classMailbox
, which has anaddMessage()
method.
This is beta code. While LIBSVM is stable, it's possible that I broke something in the process of refactoring it. I've done ad-hoc testing primarily with the C_SVC machine and an RBF kernel, and got results that were identical to LIBSVM as far as I could tell. There are not (yet?) any unit tests. I'm running some automated verifications that jlibsvm behaves identically to LIBSVM for a number of input datasets and parameter choices; results will be available here soon. Please let me know if you find a situation in which the two packages give different results.
Sorry, I haven't really had a chance to write any docs. Have a look at the sources for the command-line programs in the legacyexec package to see how jlibsvm gets called. Very briefly, you'll need to:
- instantiate the KernelFunction that you want
- set up some parameters in a new SvmParameter object
- instantiate a concrete subclass of SvmProblem (binary, multiclass, or regression), and populate it with training data
- instantiate a concrete subclass of SVM, choosing a type appropriate for your problem
- Call
SVM.train(problem)
to yield a SolutionModel, which can be used to make predictions
Maven is by far the easiest way to make use of jlibsvm. Just add these to your pom.xml:
<repositories>
<repository>
<id>dev.davidsoergel.com releases</id>
<url>http://dev.davidsoergel.com/nexus/content/repositories/releases</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>dev.davidsoergel.com snapshots</id>
<url>http://dev.davidsoergel.com/nexus/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>edu.berkeley.compbio</groupId>
<artifactId>jlibsvm</artifactId>
<version>0.911</version>
</dependency>
</dependencies>
If you really want just the jar, you can get the latest release from the Maven repo; or get the latest stable build from the build server.