OMLRandomBotv2

The current implementation of the Bot follows the following scheme:

Init Bot with a task.id
Draw a learner with probability proportional to its param set dimensions
Draw a random hyperparameter config
Resample sampled learner/hyperpars on the OML Task

Learners

From the old bot

xgboost
svm
kernel knn
random forest
rpart
glmnet

New learners

Multinomial Logit (from mxnet?)
Cubist
fully connected neural networks (mxnet?) up to depth 3 or 4

Worthy Candidates (From Kaggle etc.)

ExtraTrees (we can enable this in ranger)
Lightgbm / Catboost (Probably to similar to xgboost)
LibFM (Factorization Machines)[https://github.com/dselivanov/rsparse]
(LiquidSVM)[https://cran.r-project.org/web/packages/liquidSVM/index.html]
Adaboost / (FastAdaBoost)[https://cran.r-project.org/web/packages/fastAdaboost/fastAdaboost.pdf]

Datasets

(OpenML - CC18)[https://www.openml.org/s/99]
(AutoML data sets)[https://github.com/openml/automlbenchmark]

Parameter Spaces

See learners.R

Open Questions:

Draw a random task inside the bot or obtain it from outside?
Divide into big / small datasets and fast / slow learners?
Sample according to algo paramset dimensions?
Should e.g. xgboost's gbtree and gblinear be sampled with equal probability?
How do we do logging of failed jobs?

How do I run the bot?

We currently require a OML task.id for the bot to run

bot = OMLRandomBot$new(11)
bot$run()

Required packages

# Benchmark
library(mlr)
library(batchtools)
library(R6)
library(callr)
library(data.table)
library(ParamHelpers)

# Learners
library(rpart)
library(glmnet)
library(e1071)
library(ranger)
library(xgboost)

bakopyan/OMLRandomBotv2

OMLRandomBotv2

Learners

Datasets

Parameter Spaces

Open Questions:

How do I run the bot?

Required packages