Home | Documentation | Community | Help | IRC Chat
Download: current stable version (4.2.0)
mlpack is an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers.
mlpack's lightweight C++ implementation makes it ideal for deployment, and it can also be used for interactive prototyping via C++ notebooks (these can be seen in action on mlpack's homepage).
In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.
Quick links:
- Quickstart guides: C++, CLI, Python, R, Julia, Go
- mlpack homepage
- mlpack documentation
- Examples repository
- Tutorials
- Development Site (Github)
mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.
- Citation details
- Dependencies
- Installing and using mlpack in C++
- Building mlpack bindings to other languages
- Building mlpack's test suite
- Further resources
If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):
@article{mlpack2023,
title = {mlpack 4: a fast, header-only C++ machine learning library},
author = {Ryan R. Curtin and Marcus Edel and Omar Shrit and
Shubham Agrawal and Suryoday Basak and James J. Balamuta and
Ryan Birmingham and Kartik Dutt and Dirk Eddelbuettel and
Rishabh Garg and Shikhar Jaiswal and Aakash Kaushik and
Sangyeon Kim and Anjishnu Mukherjee and Nanubala Gnana Sai and
Nippun Sharma and Yashwant Singh Parihar and Roshan Swain and
Conrad Sanderson},
journal = {Journal of Open Source Software},
volume = {8},
number = {82},
pages = {5026},
year = {2023},
doi = {10.21105/joss.05026},
url = {https://doi.org/10.21105/joss.05026}
}
Citations are beneficial for the growth and improvement of mlpack.
mlpack requires the following additional dependencies:
If the STB library headers are available, image loading support will be available.
If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.
See also the C++ quickstart.
Since mlpack is a header-only library, installing just the headers for use in a C++ application is trivial.
From the root of the sources, configure and install in the standard CMake way:
mkdir build && cd build/
cmake ..
sudo make install
If the cmake ..
command fails due to unavailable dependencies, consider either using the
-DDOWNLOAD_DEPENDENCIES=ON
option as detailed in
the following subsection, or ensure that mlpack's dependencies
are installed, e.g. using the system package manager. For example, on Debian and Ubuntu,
all relevant dependencies can be installed with sudo apt-get install libarmadillo-dev libensmallen-dev libcereal-dev g++ cmake
.
Alternatively, since CMake v3.14.0 the cmake
command can create the build folder itself,
the above commands can be rewritten as follows:
cmake -S . -B build
sudo cmake --build build --target install
You can add a few arguments to the cmake
command to control the behavior of
the configuration and build process. Simply add these to the cmake
command.
Some options are given below:
-DDOWNLOAD_DEPENDENCIES=ON
will automatically download mlpack's dependencies (ensmallen, Armadillo, and cereal). Installing Armadillo this way is not recommended and it is better to use your system package manager when possible (see below).-DCMAKE_INSTALL_PREFIX=/install/root/
will set the root of the install directory to/install/root
whenmake install
is run.-DDEBUG=ON
will enable debugging symbols in any compiled bindings or tests.
There are also options to enable building bindings to each language that mlpack supports; those are detailed in the following sections.
Once headers are installed with make install
, using mlpack in an application
consists only of including it. So, your program should include mlpack:
#include <mlpack.hpp>
and when you link, be sure to link against Armadillo. If your example program
is my_program.cpp
, your compiler is GCC, and you would like to compile with
OpenMP support (recommended) and optimizations, compile like this:
g++ -O3 -std=c++14 -o my_program my_program.cpp -larmadillo -fopenmp
Note that if you want to serialize (save or load) neural networks, you should
add #define MLPACK_ENABLE_ANN_SERIALIZATION
before including <mlpack.hpp>
.
If you don't define MLPACK_ENABLE_ANN_SERIALIZATION
and your code serializes a
neural network, a compilation error will occur.
See the C++ quickstart and the
examples repository for some examples of
mlpack applications in C++, with corresponding Makefile
s.
When the autodownloader is used to download Armadillo
(-DDOWNLOAD_DEPENDENCIES=ON
), the Armadillo runtime library is not built and
Armadillo must be used in header-only mode. The autodownloader also does not
download dependencies of Armadillo such as OpenBLAS. For this reason, it is
recommended to instead install Armadillo using your system package manager,
which will also install the dependencies of Armadillo. For example, on Ubuntu
and Debian systems, Armadillo can be installed with
sudo apt-get install libarmadillo-dev
and other package managers such as dnf
and brew
and pacman
also have
Armadillo packages available.
If the autodownloader is used to provide Armadillo, mlpack programs cannot be
linked with -larmadillo
. Instead, you must link directly with the
dependencies of Armadillo. For example, on a system that has OpenBLAS
available, compilation can be done like this:
g++ -O3 -std=c++14 -o my_program my_program.cpp -lopenblas -fopenmp
See the Armadillo documentation for more information on linking Armadillo programs.
mlpack is a template-heavy library, and if care is not used, compilation time of a project can be increased greatly. Fortunately, there are a number of ways to reduce compilation time:
-
Include individual headers, like
<mlpack/methods/decision_tree.hpp>
, if you are only using one component, instead of<mlpack.hpp>
. This reduces the amount of work the compiler has to do. -
Only use the
MLPACK_ENABLE_ANN_SERIALIZATION
definition if you are serializing neural networks in your code. When this define is enabled, compilation time will increase significantly, as the compiler must generate code for every possible type of layer. (The large amount of extra compilation overhead is why this is not enabled by default.) -
If you are using mlpack in multiple .cpp files, consider using
extern templates
so that the compiler only instantiates each template once; add an explicit template instantiation for each mlpack template type you want to use in a .cpp file, and then useextern
definitions elsewhere to let the compiler know it exists in a different file.
Other strategies exist too, such as precompiled headers, compiler options,
ccache
, and others.
mlpack is not just a header-only library: it also comes with bindings to a number of other languages, this allows flexible use of mlpack's efficient implementations from languages that aren't C++.
In general, you should not need to build these by hand---they should be provided by either your system package manager or your language's package manager.
Building the bindings for a particular language is done by calling cmake
with
different options; each example below shows how to configure an individual set
of bindings, but it is of course possible to combine the options and build
bindings for many languages at once.
See also the command-line quickstart.
The command-line programs have no extra dependencies. The set of programs that will be compiled is detailed and documented on the command-line program documentation page.
From the root of the mlpack sources, run the following commands to build and install the command-line bindings:
mkdir build && cd build/
cmake -DBUILD_CLI_PROGRAMS=ON ../
make
sudo make install
You can use make -j<N>
, where N
is the number of cores on your machine, to
build in parallel; e.g., make -j4
will use 4 cores to build.
See also the Python quickstart.
mlpack's Python bindings are available on
PyPI and
conda-forge, and can be installed
with either pip install mlpack
or conda install -c conda-forge mlpack
.
These sources are recommended, as building the Python bindings by hand can be
complex.
With that in mind, if you would still like to manually build the mlpack Python bindings, first make sure that the following Python packages are installed:
- setuptools
- wheel
- cython >= 0.24
- numpy
- pandas >= 0.15.0
Now, from the root of the mlpack sources, run the following commands to build and install the Python bindings:
mkdir build && cd build/
cmake -DBUILD_PYTHON_BINDINGS=ON ../
make
sudo make install
You can use make -j<N>
, where N
is the number of cores on your machine, to
build in parallel; e.g., make -j4
will use 4 cores to build. You can also
specify a custom Python interpreter with the CMake option
-DPYTHON_EXECUTABLE=/path/to/python
.
See also the R quickstart.
mlpack's R bindings are available as the R package
mlpack on CRAN.
You can install the package by running install.packages('mlpack')
, and this is
the recommended way of getting mlpack in R.
If you still wish to build the R bindings by hand, first make sure the following dependencies are installed:
- R >= 4.0
- Rcpp >= 0.12.12
- RcppArmadillo >= 0.9.800.0
- RcppEnsmallen >= 0.2.10.0
- roxygen2
- testthat
- pkgbuild
These can be installed with install.packages()
inside of your R environment.
Once the dependencies are available, you can configure mlpack and build the R
bindings by running the following commands from the root of the mlpack sources:
mkdir build && cd build/
cmake -DBUILD_R_BINDINGS=ON ../
make
sudo make install
You may need to specify the location of the R program in the cmake
command
with the option -DR_EXECUTABLE=/path/to/R
.
Once the build is complete, a tarball can be found under the build directory in
src/mlpack/bindings/R/
, and then that can be installed into your R environment
with a command like install.packages(mlpack_3.4.3.tar.gz, repos=NULL, type='source')
.
See also the Julia quickstart.
mlpack's Julia bindings are available by installing the
mlpack.jl package using
Pkg.add("mlpack.jl")
. The process of building, packaging, and distributing
mlpack's Julia bindings is very nontrivial, so it is recommended to simply use
the version available in Pkg
, but if you want to build the bindings by hand
anyway, you can configure and build them by running the following commands from
the root of the mlpack sources:
mkdir build && cd build/
cmake -DBUILD_JULIA_BINDINGS=ON ../
make
If CMake cannot find your Julia installation, you can add
-DJULIA_EXECUTABLE=/path/to/julia
to the CMake configuration step.
Note that the make install
step is not done above, since the Julia binding
build system was not meant to be installed directly. Instead, to use handbuilt
bindings (for instance, to test them), one option is to start Julia with
JULIA_PROJECT
set as an environment variable:
cd build/src/mlpack/bindings/julia/mlpack/
JULIA_PROJECT=$PWD julia
and then using mlpack
should work.
See also the Go quickstart.
To build mlpack's Go bindings, ensure that Go >= 1.11.0 is installed, and that
the Gonum package is available. You can use go get
to install mlpack for Go:
go get -u -d mlpack.org/v1/mlpack
cd ${GOPATH}/src/mlpack.org/v1/mlpack
make install
The process of building the Go bindings by hand is a little tedious, so following the steps above is recommended. However, if you wish to build the Go bindings by hand anyway, you can do this by running the following commands from the root of the mlpack sources:
mkdir build && cd build/
cmake -DBUILD_GO_BINDINGS=ON ../
make
sudo make install
mlpack contains an extensive test suite that exercises every part of the codebase. It is easy to build and run the tests with CMake and CTest, as below:
mkdir build && cd build/
cmake -DBUILD_TESTS=ON ../
make
ctest .
If you want to test the bindings, too, you will have to adapt the CMake configuration command to turn on the language bindings that you want to test---see the previous sections for details.
More documentation is available for both users and developers.
User documentation:
- File formats and loading data in mlpack
- Matrices in mlpack
- Cross-Validation
- Hyper-parameter Tuning
- Building mlpack from source on Windows
- Sample C++ ML App for Windows
- Examples repository
Tutorials:
- Alternating Matrix Factorization (AMF)
- Artificial Neural Networks (ANN)
- Approximate k-Furthest Neighbor Search (
approx_kfn
) - Collaborative Filtering (CF)
- DatasetMapper
- Density Estimation Trees (DET)
- Euclidean Minimum Spanning Trees (EMST)
- Fast Max-Kernel Search (FastMKS)
- Image Utilities
- k-Means Clustering
- Linear Regression
- Neighbor Search (k-Nearest-Neighbors)
- Range Search
- Reinforcement Learning
Developer documentation:
- mlpack versions in code
- Writing an mlpack binding
- mlpack Timers
- mlpack automatic bindings to other languages
- The ElemType policy in mlpack
- The KernelType policy in mlpack
- The MetricType policy in mlpack
- The TreeType policy in mlpack
To learn about the development goals of mlpack in the short- and medium-term future, see the vision document.
If you have problems, find a bug, or need help, you can try visiting
the mlpack help page, or mlpack on
Github. Alternately, mlpack help can be
found on Matrix at #mlpack
; see also the
community page.