/h2oEnsemble-benchmarks

Benchmarks of the H2O Ensemble R interface (H2O 2.0).

Primary LanguageR

h2oEnsemble Benchmarks (H2O 2.0)

Note: These benchmarks are for H2O 2.0 (aka. "H2O Classic"). The current version of H2O is H2O 3.0. There are no plans to update this benchmark at this time.

This repository provides a framework for benchmarking the h2oEnsemble package. The h2oEnsemble package provides an R API to train ensembles of H2O machine learning algorithms. You can read more about the "H2O Ensemble" project on its GitHub page. Note that as of H2O 3.10.3.1 (Jan 30, 2017), Stacked Ensembles are now available as part of base H2O so the h2oEnsemble package is not longer "needed" and hence no longer maintained.

  • An example binary classification benchmark is provided in the twoClass/higgs folder.
  • To use the software on an Amazon EC2 cluster, follow the instructions in the ec2/README.md file to set up the cluster.
  • After you set up the cluster, you should cd to a specific benchmark directory and execute the benchmark. For example, the higgs benchmark can be executed as follows:
cd src/twoClass/higgs
nohup ./run-bench.sh &
  • If you choose to use a single node instead of a multi-node cluster, you can skip the EC2 steps. If a multi-node cluster is not available, the code will create a single node multicore cluster using all the cores available on your machine.
  • If the benchmark is not executed using the H2O Ubuntu AMI, you may have to change the path of the BENCH_ROOT variable in the scripts to point to the location of this repository on your machine.