The current implementation of the Bot follows the following scheme:
- Init Bot with a task.id
- Draw a learner with probability proportional to its param set dimensions
- Draw a random hyperparameter config
- Resample sampled learner/hyperpars on the OML Task
From the old bot
- xgboost
- svm
- kernel knn
- random forest
- rpart
- glmnet
New learners
- Multinomial Logit (from mxnet?)
- Cubist
- fully connected neural networks (mxnet?) up to depth 3 or 4
Worthy Candidates (From Kaggle etc.)
- ExtraTrees (we can enable this in ranger)
- Lightgbm / Catboost (Probably to similar to xgboost)
- LibFM (Factorization Machines)[https://github.com/dselivanov/rsparse]
- (LiquidSVM)[https://cran.r-project.org/web/packages/liquidSVM/index.html]
- Adaboost / (FastAdaBoost)[https://cran.r-project.org/web/packages/fastAdaboost/fastAdaboost.pdf]
- (OpenML - CC18)[https://www.openml.org/s/99]
- (AutoML data sets)[https://github.com/openml/automlbenchmark]
See learners.R
- Draw a random task inside the bot or obtain it from outside?
- Divide into big / small datasets and fast / slow learners?
- Sample according to algo paramset dimensions?
- Should e.g.
xgboost's
gbtree
andgblinear
be sampled with equal probability? - How do we do logging of failed jobs?
We currently require a OML task.id
for the bot to run
bot = OMLRandomBot$new(11)
bot$run()
# Benchmark
library(mlr)
library(batchtools)
library(R6)
library(callr)
library(data.table)
library(ParamHelpers)
# Learners
library(rpart)
library(glmnet)
library(e1071)
library(ranger)
library(xgboost)