/neural.benchmark

Comparisons of HTM to other ML algorithms on well known datasets and synthetic anomaly benchmarks

Primary LanguageMatlabGNU General Public License v2.0GPL-2.0

# Machine Learming benchmarks

Aiming to thouroughly benchmark and compare ML algorithms (with current focus on HTM), be designing specialized synthetic datasets that stress a single feature and can be well evaluated and understood.

For users being able to decide where each algorithm has its strong/weak-spots and decide in application for real-world problems.

This can also work as a benchmark to evaluate development impact in changes to the algorithms.

Zoom of anomaly results on data loss benchmark.

Goals of this project

This repository should be a collection of

  • datasets
  • real-world
  • synthetic
  • papers
  • algorithm implementations
  • (initially) focus on HTM from NuPIC, gladly include any other algorithms/results.
  • results
  • as CSV, image, ...
  • collection of ideas in the Issues

Current state

  • Anomalies classified in independent, significant categories -> anomaly categorization
  • create synthetic benchmars/datasets for the anomaly categories:
  • point anomalies
  • interval/section anomalies
  • "higher" trend (low freq. modulation) anomalies
  • additional
  • HTM benchmars on the above datasets
  • Discussion of the results & suggested improvements
  • Comparison with other well known ML approaches

Research interest

The topics of research interest are classified in Issue's labels, and are related to: NuPIC, dataset creation, novel research ideas, cognitive modeling, and so on...

How do I use/work with this repo?

  • You can browse CSV and images in data/datasets/*/results/ folders.
  • To modify or regenerate new datasets, go to datasets/generatingScripts/MAIN.m, you'll need Matlab/R for that.
  • To rerun (HTM/NuPIC) models, python opf/anomaly_benchmark.py NuPIC has to be installed.
  • For visualizing data, use plotResutls/plotDatasets.m or interactive tool nupic.visualizations, which can be run online

Results

  • Evaluation results on benchmarks are located in results/ subfolders under respective paths and presented in form of CSV and image data.
  • See Hypotheses for ongoing results.
  • Ideas & findings from our work:
  • Importance of noise/"non-integer f/fs sampling problem" for ease of learning and quality of abstraction in (artificial) benchmarks.
  • NuPIC: AdaptiveScalar encoder is NOT suitable for streaming data, use RDSE instead.
  • How to compute optimal resulution for RDSE
  • NuPIC: Boosting implementation is causing artificial disturbances in the predictions (and can be improved, or should be turned off)
  • NuPIC:
  • good on point anomaly
  • good on interval anomaly

Open-research & hypothesis

Warning: anything here may, or may not be true. It is under evaluation. We are raising the topics here to get your focus on the current issues and possible findings.

  • NuPIC: AnomalyLikelihood implementation is inferior to the "raw" Anomaly, esp. with combination of noisy data that distort the internal distribution model.
  • Impact of parameter optimization (swarming) on NuPIC's performance
  • NuPIC: failing to abstract on trend data?

## Sources

  1. Hierarchical Temporal Memory, Numenta. Available at: http://numenta.org/resources/HTM_CorticalLearningAlgorithms.pdf

  2. Hawkins, Jeff (2004). On Intelligence, Times Books. ISBN 0805074562.

  3. Uhl, Christian (1999). Analysis of Neurophysiological Brain Functioning, Springer. ISBN 978-3-642-64219-7

  4. The Sicence of Anomaly Detection, Numenta. Available at: http://numenta.com/assets/pdf/whitepapers/Numenta%20White%20Paper%20-%20Science%20of%20Anomaly%20Detection.pdf

  5. Schmidhuber, Jürgen (2014). Deep Learning in Neural Networks: An Overview, The Swiss AI Lab IDSIA. Available at: http://arxiv.org/pdf/1404.7828v4.pdf

  6. Twitter, Anomaly Detection. Available at: https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series

  7. Skyline, Anomaly Detection. Available at: https://github.com/etsy/skyline

  8. Yahoo, Time Series Anomaly Detection. Available at: http://yahoolabs.tumblr.com/post/114590420346/a-benchmark-dataset-for-time-series-anomaly

Related sources

Acknowledgement