Stand-alone programs for numerical analysis in OCaml

Sometimes you need a small code (that has no dependency to a huge or unfamiliar library/tool) for scientific computing. In this repository, we distribute such OCaml programs under MIT license (the copyright of each data set belongs to the maker of the data).

Linear algebra

LU decomposition: LU decomposition is factorization such that PA = LU (or A = PLU) where A is a matrix, L is a lower trapezoidal matrix, U is a upper trapezoidal matrix, and P is a permutation matrix. LU decomposition is used for solving linear equations, computing determinant, etc. This code implements Crout's method.
- Compilation: ocamlopt lu.ml
QR decomposition: QR decomposition is to factorize matrix A into QR where Q is an orthogonal matrix and R is a right trapezoidal matrix (a.k.a., an upper trapezoidal matrix). QR decomposition is used for solving linear equations, eigenproblems, etc. This program performs QR decomposition via Householder transformation.
- Compilation: ocamlopt qr.ml

Signal processing

Fast Fourier transform: This is an implementation of radix-2 Cooley-Tukey fast Fourier transform (FFT) algorithm, the most famous method in FFT algorithms. The naive computation according to the definition of discrete Fourier transform (DFT) takes O(n^2) time where n is the number of input points, but this FFT algorithm takes only O(n log n) time. Input length n must be equal to 2^m (where m is a natural number). Fourier transform is frequently used for signal analysis, data compression, etc.
- Compilation: ocamlopt fft.ml
Autocorrelation & Levinson-Durbin recursion: Levinson-Durbin recursion is an algorithm to compute AR coefficients of autoregressive (AR) model. The most well-known application of AR model is linear predictive coding (LPC), a classic analysis/coding/compression approach for voice. We decompose input voice into glottal source (buzz-like sound) and vocal tract filter characteristics (filter coefficients) by using Levinson-Durbin algorithm, and analyze or encode the two kinds of sound by different ways. LPC vocoder (voice coder) is applied to FS-1015 (secure telephony speech encoding), Shorten, MPEG-4 ALS, FLAC audio codec, etc. This program computes AR coefficients from time-domain sound and outputs them.
- Compilation: ocamlopt dataset.ml levinson.ml
- Data set: Japanese vowel sound /a/ (1957 points), /i/ (3439 points), /u/ (2644 points), /e/ (3316 points), /o/ (2793 points); sampling rate = 16000, original data is at http://www.gavo.t.u-tokyo.ac.jp/~mine/B3enshu2001/samples.html
- AR order: 20

Machine learning

Classification

Naive multilayer neural network: a neural network that has two or more layers can be used for nonlinear classification, regression, etc. in machine learning. This code is a very simple implementation of multilayer neural network. This neural network tends to fall into over-fitting. (In the past, multilayer neural networks are rarely applied for practical tasks because they have some problems such as over-fitting and vanishing gradient. After 2006, Hinton et al. proposed some epoch‐making approaches to solve the problems and accomplished surprisingly high performance. The newer techniques are known as deep learning.) The following default setting is for classification. If you want to use this for regression, you should change the activation function of the output layer to a linear function, and the error function to sum of squared errors.
- Compilation: ocamlopt dataset.ml neuralNetwork.ml
- Data set: Ionosphere (UCI Machine Learning Repository) (#features = 34, #classes = 2)
- Training: error backpropagation [Rumelhart et al., 1986] + stochastic gradient descent (with a constant learning rate)
- Regularization: none
- Error function: cross-entropy
- Layers: 3 layers + the input layer (all neurons in each layer are connected with all neurons in the lower layer.
- The 1st hidden layer: 10 units, activation function = tanh
- The 2nd hidden layer: 5 units, activation function = tanh
- The output layer: 2 units (binary classification, 1-of-K coding), activation function = softmax

Clustering

K-means: This program implements a classic clustering approach K-means.
- Compilation: ocamlopt dataset.ml kmeans.ml
- Data set: artificially generated according to three kinds of Gaussian distribution (dimension = 2, #classes = 3, #points of each class = 1000)

Miscellaneous

Durand-Kerner-Aberth method: Durand-Kerner method is an algorithm to find all (complex) roots of a given polynominal at the same time, and Aberth method is an approach to compute the initial values for Durand-Kerner method.
- Compilation: ocamlopt dka.ml

Utilities

WAV reader/writer: A lightweight reader/writer for WAV files. This code only supports uncompressed linear PCM format.
- Compilation: ocamlopt -c wav.mli wav.ml

zbroyar/ocaml-numerical-analysis