/online-stats

Online statistics implementations, including average, variance and standard deviation; exponentially weighted versions as well.

Primary LanguageScalaMIT LicenseMIT

online-stats

Maven Central   GitHub   Travis (.org)   Codecov   Javadocs   Gitter   Twitter  

Scope

Naive implementation of a few online statistical algorithms.

The idea behind this implementation is to be used as a tool for stateful streaming computations.

Algos covered so far:

  • 4 statistical moments and the derived features:
    • average
    • variance and standard deviation
    • skewness
    • kurtosis
  • covariance
  • exponentially weighted moving averages and variance

Algos to be researched:

  • exponentially weighted moving skewness
  • exponentially weighted moving kurtosis

Using a more formal and mature library like Apache Commons Math is probably a better idea for production applications, but this is also tested against it.

Description

The main concepts introduced in this library are the Stats, EWeightedStats (exponentially weighted stats), VectorStats and Covariance. Each of them can be composed using either the append or the |+| functions.

For example, if we have a sequence of numbers, we can compute the statistics like this:

  val xs1 = Seq(1.0, 3.0)
  val stats1: Stats = xs1.foldLeft(Stats.Nil)((s, x) => s |+| x)
  val xs2 = Seq(5.0, 7.0)
  val stats2: Stats = xs2.foldLeft(Stats.Nil)((s, x) => s |+| x)
  val totalStats = stats1 |+| stats2
  val newStats = totalStats |+| 4.0

The Stats type with the |+| operation also form a monoid, since |+| has an identity (unit) element, Stats.Nil, and it is associative.

Also the |+| operation is also commutative, which makes appealing for distributed computing as well.

Same goes for VectorStats and Covariance.

EWeightedStats is an exception for now, as two EWeightedStats instances can not be composed. However, the |+| works between an EWeightedStats instance and a double.

Complexity

Feature Space Complexity (O) Time Complexity (O)
Count, Sum, Min, Max O(1) (1 * MU) O(1)
Average O(1) (2 * MU) O(1)
Variance, Standard deviation O(1) (3 * MU) O(1)
Skewness O(1) (4 * MU) O(1)
Kurtosis O(1) (5 * MU) O(1)
Exponentially weighted average O(1) (2 * MU) O(1)
Exponentially weighted variance O(1) (2 * MU) O(1)

MU: Memory Unit, e.g. Int: 4 bytes, Double 8: bytes

Demos and Examples

The streaming-anomalies-demos project was created to explore and demonstrate some basic use cases for the online-stats library.

References