openml/automlbenchmark

Add PerpetualBooster

deadsoul44 opened this issue ยท 10 comments

Add PerpetualBooster as an additional algorithm.

https://github.com/perpetual-ml/perpetual

It does not need hyperparameter tuning and supports multi-output and multi-class cases.

I can create a pull request if you are willing to review and accept.

I think it's interesting, but I am planning to try and add a feature soon that allows having integration scripts in separate independent repositories. I propose I'll leave another message here when I have something experimental going. Perhaps it would be interesting to try out?

It will be really helpful to benchmark our algorithm. I am waiting for it.

You can always do local integration for yourself if you just want to use the benchmark with your framework. There is no need to have it included in this codebase for that.

I compared PerpetualBooster against AutGluon (BQ), which is the number one framework in the benchmark, and got some promising results in local tests on small and medium tasks. I have some questions.

  • All tasks are classification tasks in small, medium, large yml files. Where are regression tasks?
  • I want to run the benchmark with only PerpetualBooster on AWS to compare the results against the rest of the frameworks. What is the default EC2 instance type? What is the correct command to run on AWS? I don't want to make a mistake due to costs.
  • Are you willing to review and merge a pull request to include PerpetualBooster in the repo and website if the results are good enough?
  • The default metrics for classification are AUC and LogLoss. But I think F1 score is a better metric because frameworks can overfit to logloss especially. Is it possible to include F1 as a default metric or as an additional metric?

P.s. I checked the repo and website before asking these. Thanks in advance.

Answering my own first two questions after reading the paper :)
https://jmlr.org/papers/volume25/22-0493/22-0493.pdf

Correct me if I am wrong.

Hello,

I am trying to run PerpetualBooster on AWS. But, I keep getting the following error:

[INFO] [amlb:19:19:50.735] Running benchmark `perpetualbooster` on `example` framework in `local` mode.
[INFO] [amlb.frameworks.definitions:19:19:50.791] Loading frameworks definitions from ['/s3bucket/user/frameworks.yaml'].
[INFO] [amlb.resources:19:19:50.794] Loading benchmark constraint definitions from ['/repo/resources/constraints.yaml'].
[INFO] [amlb.benchmarks.file:19:19:50.800] Loading benchmark definitions from /repo/resources/benchmarks/example.yaml.
[ERROR] [amlb:19:19:50.802] No module named 'frameworks.PerpetualBooster'
Traceback (most recent call last):
  File "/repo/runbenchmark.py", line 196, in <module>
    bench = bench_cls(**bench_kwargs)
  File "/repo/amlb/benchmark.py", line 115, in __init__
    self.framework_module = import_module(self.framework_def.module)
  File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'frameworks.PerpetualBooster'

I guess it was an installation error and I tried everyting to install the package on EC2 env.

frameworks.yaml file on user_dir:

# put this file in your ~/.config/automlbenchmark directory
# to override default configs
---
PerpetualBooster:
  version: 'stable'
  description: |
    A self-generalizing gradient boosting machine which doesn't need hyperparameter optimization.
  project: https://github.com/perpetual-ml/perpetual
  setup_cmd: 'pip install --no-cache-dir -U https://perpetual-whl.s3.eu-central-1.amazonaws.com/perpetual-0.6.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl'

But it didn't work.

I also tried requirements.txt in user_dir.

Any help appreciated @PGijsbers

It cannot find the integration in the normal folder (frameworks/perpetualbooster). This is most likely because you are using the original automlbenchmark repo instead of your own fork which has the integration. You can specify which repository is downloaded to the EC2 instance with the project_repository field in the configuration:

project_repository: https://github.com/openml/automlbenchmark#stable # this is also the url used to clone the repository on ec2 instances

I made some progress but now I keep getting the following error in results.csv file:

ModuleNotFoundError: No module named 'perpetual'

My fork is here:
https://github.com/deadsoul44/automlbenchmark

I updated requirements files. Let me know what I am missing. Thanks in advance.

From memory, hopefully it's correct:
It looks like you have set up the installation script to use a virtual environment (it's what the true is for here)
But you are not calling it from the environment (this should use run_in_venv (see e.g. autogluon).

The setup_cmd in your configuration is probably also superfluous.

Hello,

I was able to run the regression benchmark with 33 datasets. I get the following error when trying to upload the results to the website.

File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
             ^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
File "/mount/src/amlb-streamlit/pages/cd_diagram.py", line 61, in <module>
    mean_results = preprocess_data(mean_results)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mount/src/amlb-streamlit/core/data.py", line 55, in preprocess_data
    results = impute_results(
              ^^^^^^^^^^^^^^^
File "/mount/src/amlb-streamlit/core/data.py", line 40, in impute_results
    raise ValueError(f"{with_=} is not in `results`")