/ScalaForML

resource files for Scala for Machine Learning

Primary LanguageScala

Scala for Machine Learning Version 0.96a Copyright Patrick Nicolas All rights reserved 2013-2015
=================================================================================================
Source code, data files and utilities related to "Scala for Machine Learning"

Overview

The source code provides software developers with a broad overview of the difference in machine learning algorithms. The reader is expected to have a good grasp of the Scala programming language along with some knowledge in basic statistics. Experience in data mining and machine learning is not a pre-requisite.

The examples are related to investment portfolio management and trading strategies. For the readers interested either in mathematics or the techniques implemented in this library, I strongly recommend the following readings:

  • "Machine Learning: A Probabilistic Perspective" K. Murphy
  • "The Elements of Statistical Learning" T. Hastie, R. Tibshirani, J. Friedman
The real-world examples, related to financial and market analysis, used for the sole purpose of illustrating the machine learning techniques. They do not constitute a recommendation or endorsement of any specific investment management or trading techniques.
The Appendix contains an introduction to the basic concepts of investment and trading strategies as well as technical analysis of financial markets.

Minimum Requirements

Hardware: 2 CPU core with 4 Gbytes RAM for small datasets to build and run examples.
4 CPU Core and 8+ Gbytes RAM for datasets of size 75,000 or larger and/or with 50 features set or larger
Operating system: None
Software: JDK 1.7.0_45 or 1.8.0_25, Scala 2.10.3/2.10.4 or 2.11.1 and SBT 0.13+ (see installation section for deployment.

Project Components

Directory structure of the source code library for Scala for Machine Learning:

Source code



Directory structure of the source code of the examples for Scala for Machine Learning:

Examples



Library components for Scala for Machine Learning:

Libraries



Installation and Build

Installation

The installation and build workflow is described in the following diagram:

Installation and build


Eclipse The Scala for Machine Learning library is compatible with Eclipse Scala IDE 3.0
Specify link to the source in Project/properties/Java Build Path/Source. The two links should be project_name/src/main/scala and project_name/src/test/scala
Add the jars required to build and execute the code within Eclipse Project/properties/Java Build Path/Add External Jarsas declared in the project_name/.classpath
Update the JVM heap parameters in eclipse.ini file as -Xms512m -Xmx8192m or the maximum allowed on your specific machine.

Build

The Simple Build Too (SBT) has to be used to build the library from the source code using the build.sbt file in the root directory
Executing the examples/test in Scala for Machine Learning require sufficient JVM Heap memory (~2G):
in sbt/conf/sbtconfig.text set Xmx to 2058m or higher, -XX:MaxPermSize to 512m or higher i.e. -Xmx4096m -Xms512m -XX:MaxPermSize=512m

Build script for Scala for Machine Learning:
To build the Scala for Machine Learning library package
$(ROOT)/sbt clean publish-local
To build the package including test and resource files
$(ROOT)/sbt clean package
To generate scala doc for the library
$(ROOT)/sbt doc
To generate scala doc for the examples
$(ROOT)/sbt test:doc
To compile all examples:
$(ROOT)/sbt test:compile
To run one test suite (i.e. Chap 3)
$(ROOT)/sbt
> test-only *Chap3
To run all tests:$(ROOT)/sbt test:run

Appendix

List of Jar files for Eclipse/Scala IDE setup

CRF-Trove_3.0.2.jar
LBFGS.jar
colt.jar
CRF.jar
commons-math3-3.3.jar
libsvm.jar
jfreechart-1.0.17/lib/jcommon-1.0.21.jar
junit-4.11.jar
jfreechart-1.0.17/lib/jfreechart-1.0.17.jar
com.typesafe/config/1.2.1/bundles/config.jar
jfreechart-1.0.17/lib/servlets.jar
akka-actor_2.11-2.3.6.jar
scalatest_2.11.jar
spark-assembly-1.1.0-hadoop2.4.0-no_scala.jar