/Shark

The Shark Machine Leaning Library. See more:

Primary LanguageC++GNU Lesser General Public License v3.0LGPL-3.0

Shark is a fast, modular, general open-source C++ machine
learning library. 

Shark is licensed under the GNU Lesser General Public License, please
see the files COPYING and COPYING.LESSER, or visit
http://www.gnu.org/licenses .

Any application of the SHARK code toward military research and use is
expressly against the wishes of the SHARK development team.


INSTALLATION / DOCUMENTATION
----------------------------

The entry point to the Shark library documentation is located at
doc/index.html . For installation instructions, please click on
"Getting started" on that page. Short version of installation guide:
issue "ccmake ." in the main directory to select your build options,
and afterwards issue "make" in the main directory -- you should be
done (assuming Boost and CMake were installed). See the documentation
for detailed instructions.

BUILDING THE DOCUMENTATION: To build the documentation yourself (e.g.,
if you need to read it locally in order to install it, i.e., because
you don't have internet), see doc/README.txt


FILE STRUCTURE
--------------


README.txt          This file (residing in the root directory of
                    the Shark library).

CMakeLists.txt      Definitions for the CMake build system.

include/            This directory and its sub-directories hold
                    all include files of the library. Note that
                    some functionality is implemented in lower-
                    level Impl/ folders and inline .inl files.

lib/                The Shark library is placed in this directory.
                    In the source code distribution this directory
                    is initially empty, and the library is placed
                    into the directory as the results of
                    compilation. Binary distributions already
                    contain the library, pre-built in release mode.

doc/                All documentation files are found in this
                    sub-directory. In packaged versions of Shark
                    the html documentation is pre-built; the
                    repository provides the corresponding sources.
                    The documentation contains technical reference
                    documents for all classes and functions as well
                    as a collection of introductory and advanced
                    tutorials.

doc/index.html      Entry point to the Shark documentation.

examples/           The examples directory contains example
                    use-cases of the most important algorithms
                    implemented in Shark. Besides exemplifying
                    powerful learning algorithms, these programs
                    are intended as starting points for
                    experimentation with the library. The
                    executables corresponding to the C++ example
                    programs are found in examples/bin/.

Test/               Shark comes with a large collection of unit
                    tests, all of which reside inside the Test
                    directory.

bin/                The binaries of the Shark unit tests are placed
                    here. Once the CMake build system is set up
                    (with the "ccmake" command or equivalent) the
                    whole test suite can be executed with the
                    command "make test", issued in the Shark root
                    directory.

src/                Source files of the Shark library. Note that
                    from Shark version 3 onwards large parts of the
                    library are templated and therefore header-only.

contrib/            The contrib directory contains (non-standard)
                    tools by third parties. Typically, there is no
                    need for users of Shark to deal with these
                    tools directly.

gpl-3.0.txt         GNU general public license, version 3.


Note:
Depending of the type of Shark distribution (binary or source
package, or current repository snapshot) not all of theses files
and directories are present.



PACKAGE STRUCTURE
-----------------

  >> Note for users of Shark 2: <<
  The internal structure of the Shark library has changed in the
  transition to version 3. The old infrastructure packages Array, Rng,
  and FileUtil, as well as parts of LinAlg, have been replaced with
  more modern solutions provided by Boost. The machine learning
  related components EALib, MOO-EALib, Mixture, ReClaM, and TimeSeries
  have been unified and organized into competely new interfaces.
  Therefore there is no one-to-one correspondance between files or
  even concepts in version 3 and in older versions of Shark. In fact,
  the lion's share of the library has been rewritten from scratch,
  and this is also reflected in a completely new structure. In
  particular, many of the rather independent sub-modules (such as
  Mixture and MOO-EALib) have been unified. They now share the same
  top-level interfaces and thus form a coherent learning architecture.

The organization of the include/ directory reflects the structure of
the Shark library. It consists of the following modules:


GENERAL INFRASTRUCTURE:

Rng                 Random number generator. The interface of this
                    component has remained nearly unchanged since
                    early versions of Shark, but under the hood
                    Shark 3 has switched to Boost as the back-end.

LinAlg              Data structures and algorithms for typical
                    linear algebra computations. For (dense and
                    sparse) vector and matrix classes Shark relies
                    on Boost uBLAS. Many higher level algorithms
                    (such as singular value decomposition) are
                    still implemented by the library itself.

Statistics          This component is new in Shark 3. It wraps the
                    capabilities of Boost accumulators, and it
                    provides tools that appear regularly in machine
                    learning, such as the Mann-Whitney U-test (also
                    known as the Wilcoxon rank-sum test).


LEARNING INFRASTRUCTURE:

Core                The core module is the central place for all
                    top-level interfaces. In addition it holds a
                    few infrastructure classes, such as exceptions.

Data                The data module hosts data containers that have
                    been specifically designed for the needs of
                    machine learning code. Also, data can be
                    imported and exported from and to different
                    standard machine learning data file formats.


MACHINE LEARNING:

Models              Models are adaptive systems, the architectures
                    on top of which (machine) learning happens.
                    Shark features a rich set of models, from simple
                    linear maps to (feed-forward and recurrent)
                    neural networks, support vector machines, and
                    different types of trees. Models can also be
                    concatenated with data format converters and
                    other models.

ObjectiveFunctions  This module collects different types of cost,
                    fitness, or objective functions for learning.
                    The bandwidth includes data-dependent error
                    functions based on simple loss functions,
                    cross-validation, area under the ROC curve, and
                    different objectives used for model selection.

Algorithms          All actual learning algorithms reside in this
                    module. There are two main groups of learning
                    algorithms, namely iterative optimizers and
                    more specialized model trainers. General
                    optimizers are organized into direct search
                    and gradient-based optimization. Specialized
                    algorithms for linear programming (a part of
                    GLPK, the GNU linear programming kit) and
                    quadratic programming for training of non-linear
                    support vector machines are included. Shark
                    also ships with algorithms for efficient
                    nearest neighbor search.

Fuzzy               The fuzzy module provides classes for the
                    representation of linguistic terms, variables,
                    operators and rules, as well as fuzzy logic
                    interference engines and controllers.

Unsupervised        This module contains the Shark implementation
                    of restricted Bolzmann machines (RBMs),
                    a recent experimental feature of Shark.