/stats_cpp

Primary LanguageC++MIT LicenseMIT

stats_cpp

Build Status

stats_cpp is a bare-bones header only C++ statistics library. It is designed to be fast and easy to use with the ability to be added to existing projects without much overhead.

Table of Contents

Install

Adding stats_cpp to your project is very simple. Simply add Statistics.hpp to your project, and then include it as a header file. You should be able to use the stats namespace which includes all the functions and structures.

How to Use

Firstly, note that the data points must be stored in a vector in order to use the functions provided.

Most of the data are stored in special structures. Currently, some types of properties are only available through these structures and cannot be evaluated using function calls. The most important type of this structures is oneVarStats. More details on how to use this and other structures are described below.

Features

Some properties have multiple ways of calculating. It is recommended to use the function getOneVarStats(), provided that the property can be calculated that way. This way, any other future property that might be needed in the future is already calculated and stored in the memory.

Currently, the following properties can be calculated using the library:

Sum

std::vector<double> data { ... };

// using normal addition
double sum_1 = stats::simpleSum(&data);

// using Improved Kahan–Babuška algorithm
double sum_2 = stats::complexSum(&data);

// getOneVarStats uses the Improved Kahan–Babuška algorithm
stats::oneVarStats ovs = stats::getOneVarStats(&data);
double sum_3 = ovs.sum;

Arithmetic, Harmonic, and Geometric Mean

Arithmetic Mean

std::vector<double> data { ... };

double a_mean_1 = stats::arithmeticMean(data);

stats::oneVarStats ovs = stats::getOneVarStats(data);
double a_mean_2 = ovs.mean;

Harmonic Mean

std::vector<double> data { ... };

double h_mean = stats::harmonicMean(data);

Geometric Mean

std::vector<double> data { ... };

double g_mean = geometricMean(data);

Median and Mode

std::vector<double> data { ... };

stats::oneVarStats ovs = stats::getOneVarStats(data);
double median = ovs.median;
double mode = ovs.mode;

Variance and Standard Deviation

std::vector<double> data { ... };

stats::oneVarStats ovs = stats::getOneVarStats(data);
double variance = ovs.variance;
double standard_deviation = ovs.std;

Size

std::vector<double> data { ... };

double size_1 = data.size();

stats::oneVarStats ovs = stats::getOneVarStats(data);
double size_2 = ovs.size;

Minimum and Maximum

std::vector<double> data { ... };

stats::oneVarStats ovs = stats::getOneVarStats(data);
double min = ovs.min;
double max = ovs.max;

Q1, Q3, and IQR

std::vector<double> data { ... };

stats::oneVarStats ovs = stats::getOneVarStats(data);
double q1 = ovs.q1;
double q3 = ovs.q3;
double iqr = ovs.iqr;

Z-Score

std::vector<double> data { ... };
double value = X_VALUE;

double z1 = stats::calcZScore(value, data);

stats::oneVarStats ovs = stats::getOneVarStats(data);
double z2 = stats::calcZScore(value, ovs);

double mean = MEAN_VALUE;
double std = STANDARD_DEVIATION_VALUE;
double z3 = stats::calcZScore(value, mean, std);

Normal CDF and Inverse Normal CDF

double zScore = SOME_SCORE;
double cdf = stats::normalCDF(zScore);
double newZScore = stats::invNormalCDF(cdf);

P-Value

std::vector<double> data { ... };
double value = SOME_VALUE;

double p1 = calcPValue(value, data);

stats::OneVarStats ovs = stats::getOneVarStats(data);
double p2 = calcPValue(value, ovs);

double mean = MEAN_VALUE;
double std = STANDARD_DEVIATION_VALUE;
double p3 = calcPValue(value, mean, std);

Confidence Interval

double confidence = 0.05;
double size = 100;
double parameter = P;

stats::Interval ci = stats::calcInterval(confidence, size, parameter);

Linear Regression

const std::vector<std::pair<double, double>> pairs = { {x1, y1}, {x2, y2}, ... };

stats::LinearRegression lr = stats::calcLinearRegression(pairs);

Performance

This repo contains a very simple profiler in Profiler.hpp that could be used for benchmarking. An example is given in examples/Benchmark.cpp. Using the same program, the following benchmark was done:

Number of Data Points Number of Seconds
1000 0.0002
10,000 0.0114
100,000 1.1793
500,000 34.0256

Note that the benchmark was compiled with clang++ -O3.