/active-bagging-learning

This library proposes a plug-in approach to active learning utilizing bagging techniques. Bagging, or bootstrap aggregating, is an ensemble learning method designed to improve the stability and accuracy of machine learning algorithms.

Primary LanguagePythonApache License 2.0Apache-2.0

Active Strategy for surface response estimation

License cov Maintenance

This library proposes a plug-in approach to active learning utilizing bagging techniques. Bagging, or bootstrap aggregating, is an ensemble learning method designed to improve the stability and accuracy of machine learning algorithms. By leveraging bagging, we aim to enhance the efficiency of active learning strategies in approximating the target function $f$.

  • The objective is to approximate function $f \in \mathcal{X} \rightarrow \mathbb{R}^n$.
  • Objective : find an estimation of $f$, $\hat{f}$ in a family of measurable function $\mathcal{F}$ such that $f^* = \underset{\hat{f} \in \mathcal{F}}{\text{argmin}} \|f - \hat{f} \|$
  • At time $t$ we dispose of a set of $n$ evaluations $(x_i, f(x_i))_{i\leqslant n}$
  • All feasible points can be sampled in domain $\mathcal{X}$
  • This tools enable users to query new point based on uncertainty measure.

Installation

python -m pip install git+https://github.com/eurobios-mews-labs/active-bagging-learning.git

Basic usage

import numpy as np
import pandas as pd
from sklearn.ensemble import ExtraTreesRegressor

from active_learning import ActiveSurfaceLearner
from active_learning.components.active_criterion import VarianceEnsembleMethod
from active_learning.components.query_strategies import ServiceQueryVariancePDF
from active_learning.benchmark import functions

fun = functions.grammacy_lee_2009  # The function we want to learn
bounds = np.array(functions.bounds[fun])  # [x1 bounds, x2 bounds]
n = 50
X_train = pd.DataFrame(
    {'x1': (bounds[0, 0] - bounds[0, 1]) * np.random.rand(n) + bounds[0, 1],
     'x2': (bounds[1, 0] - bounds[1, 1]) * np.random.rand(n) + bounds[1, 1],
     })  # Initiate distribution
y_train = -fun(X_train)

active_criterion = VarianceEnsembleMethod(  # Parameters to be used to estimate the surface response
    estimator=ExtraTreesRegressor(  # Base estimator for the surface
        max_features=0.8, bootstrap=True)
)
query_strategy = ServiceQueryVariancePDF(bounds, num_eval=int(20000))

# QUERY NEW POINTS
active_learner = ActiveSurfaceLearner(
    active_criterion,  # Active criterion yields a surface
    query_strategy,  # Given active criterion surface, execute query 
    bounds=bounds)

active_learner.fit(
    X_train,  # Input data X
    y_train)  # Input data y (target))

X_new = active_learner.query(3)  # Request 3 points

To use the approach, one has to dispose of

  1. An estimator (a set of function) to fit the surface (linear model, gaussian vectors, etc.) in sklearn's API (base_estimator parameter)
  2. A surface describing an active learning criterion that will adjust the estimator and estimate its variance in some way (active_criterion component).
  3. A resampling strategy that will take a function (the active criterion surface) and makes it a query (query_strategy component).

benchmark

Exemples

  • 1D example :

benchmark

Benchmark

benchmark