/isotree-ruby

Outlier/anomaly detection for Ruby using Isolation Forest

Primary LanguageRubyBSD 2-Clause "Simplified" LicenseBSD-2-Clause

IsoTree Ruby

šŸŒ² IsoTree - outlier/anomaly detection using Isolation Forest - for Ruby

Learn how Isolation Forest works

šŸŒ³ Check out OutlierTree for human-readable explanations of outliers

Build Status

Installation

Add this line to your applicationā€™s Gemfile:

gem "isotree"

Windows is not supported at the moment

Getting Started

Prep your data

data = [
  {department: "Books",  sale: false, price: 2.50},
  {department: "Books",  sale: true,  price: 3.00},
  {department: "Movies", sale: false, price: 5.00},
  # ...
]

Train a model

model = IsoTree::IsolationForest.new
model.fit(data)

Get outlier scores

model.predict(data)

Scores are between 0 and 1, with higher scores indicating outliers

Export the model

model.export_model("model.bin")

Import a model

model = IsoTree::IsolationForest.import_model("model.bin")

Parameters

Pass parameters - default values below

IsoTree::IsolationForest.new(
  sample_size: "auto",
  ntrees: 500,
  ndim: 3,
  ntry: 1,
  max_depth: "auto",
  ncols_per_tree: nil,
  prob_pick_pooled_gain: 0.0,
  prob_pick_avg_gain: 0.0,
  prob_pick_full_gain: 0.0,
  prob_pick_dens: 0.0,
  prob_pick_col_by_range: 0.0,
  prob_pick_col_by_var: 0.0,
  prob_pick_col_by_kurt: 0.0,
  min_gain: 0.0,
  missing_action: "auto",
  new_categ_action: "auto",
  categ_split_type: "auto",
  all_perm: false,
  coef_by_prop: false,
  sample_with_replacement: false,
  penalize_range: false,
  standardize_data: true,
  scoring_metric: "depth",
  fast_bratio: true,
  weigh_by_kurtosis: false,
  coefs: "uniform",
  assume_full_distr: true,
  min_imp_obs: 3,
  depth_imp: "higher",
  weigh_imp_rows: "inverse",
  random_seed: 1,
  use_long_double: false,
  nthreads: -1
)

See a detailed explanation

Data

Data can be an array of hashes

[
  {department: "Books",  sale: false, price: 2.50},
  {department: "Books",  sale: true,  price: 3.00},
  {department: "Movies", sale: false, price: 5.00}
]

Or a Rover data frame

Rover.read_csv("data.csv")

Or a Numo array

Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])

Performance

IsoTree uses OpenMP when possible for best performance. To enable OpenMP on Mac, run:

brew install libomp

Then reinstall the gem.

gem uninstall isotree --force
bundle install

Deployment

Check out Trove for deploying models.

trove push model.bin

Reference

Get the average isolation depth

model.predict(data, output: "avg_depth")

Upgrading

0.3.0

This version uses IsoTreeā€™s new serialization format. Exported models must be recreated.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone --recursive https://github.com/ankane/isotree-ruby.git
cd isotree-ruby
bundle install
bundle exec rake compile
bundle exec rake test