Add PerpetualBooster
deadsoul44 opened this issue ยท 10 comments
Add PerpetualBooster as an additional algorithm.
https://github.com/perpetual-ml/perpetual
It does not need hyperparameter tuning and supports multi-output and multi-class cases.
I can create a pull request if you are willing to review and accept.
I think it's interesting, but I am planning to try and add a feature soon that allows having integration scripts in separate independent repositories. I propose I'll leave another message here when I have something experimental going. Perhaps it would be interesting to try out?
It will be really helpful to benchmark our algorithm. I am waiting for it.
You can always do local integration for yourself if you just want to use the benchmark with your framework. There is no need to have it included in this codebase for that.
I compared PerpetualBooster against AutGluon (BQ), which is the number one framework in the benchmark, and got some promising results in local tests on small and medium tasks. I have some questions.
- All tasks are classification tasks in small, medium, large yml files. Where are regression tasks?
- I want to run the benchmark with only PerpetualBooster on AWS to compare the results against the rest of the frameworks. What is the default EC2 instance type? What is the correct command to run on AWS? I don't want to make a mistake due to costs.
- Are you willing to review and merge a pull request to include PerpetualBooster in the repo and website if the results are good enough?
- The default metrics for classification are AUC and LogLoss. But I think F1 score is a better metric because frameworks can overfit to logloss especially. Is it possible to include F1 as a default metric or as an additional metric?
P.s. I checked the repo and website before asking these. Thanks in advance.
Answering my own first two questions after reading the paper :)
https://jmlr.org/papers/volume25/22-0493/22-0493.pdf
- www.openml.org/s/269 for regression and www.openml.org/s/271 for classification.
1h8c_gp3
for one hour,4h8c_gp3
for 4 hour
Correct me if I am wrong.
Hello,
I am trying to run PerpetualBooster on AWS. But, I keep getting the following error:
[INFO] [amlb:19:19:50.735] Running benchmark `perpetualbooster` on `example` framework in `local` mode.
[INFO] [amlb.frameworks.definitions:19:19:50.791] Loading frameworks definitions from ['/s3bucket/user/frameworks.yaml'].
[INFO] [amlb.resources:19:19:50.794] Loading benchmark constraint definitions from ['/repo/resources/constraints.yaml'].
[INFO] [amlb.benchmarks.file:19:19:50.800] Loading benchmark definitions from /repo/resources/benchmarks/example.yaml.
[ERROR] [amlb:19:19:50.802] No module named 'frameworks.PerpetualBooster'
Traceback (most recent call last):
File "/repo/runbenchmark.py", line 196, in <module>
bench = bench_cls(**bench_kwargs)
File "/repo/amlb/benchmark.py", line 115, in __init__
self.framework_module = import_module(self.framework_def.module)
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'frameworks.PerpetualBooster'
I guess it was an installation error and I tried everyting to install the package on EC2 env.
frameworks.yaml file on user_dir:
# put this file in your ~/.config/automlbenchmark directory
# to override default configs
---
PerpetualBooster:
version: 'stable'
description: |
A self-generalizing gradient boosting machine which doesn't need hyperparameter optimization.
project: https://github.com/perpetual-ml/perpetual
setup_cmd: 'pip install --no-cache-dir -U https://perpetual-whl.s3.eu-central-1.amazonaws.com/perpetual-0.6.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl'
But it didn't work.
I also tried requirements.txt in user_dir.
Any help appreciated @PGijsbers
It cannot find the integration in the normal folder (frameworks/perpetualbooster). This is most likely because you are using the original automlbenchmark repo instead of your own fork which has the integration. You can specify which repository is downloaded to the EC2 instance with the project_repository
field in the configuration:
automlbenchmark/resources/config.yaml
Line 2 in 0df26b2
I made some progress but now I keep getting the following error in results.csv file:
ModuleNotFoundError: No module named 'perpetual'
My fork is here:
https://github.com/deadsoul44/automlbenchmark
I updated requirements files. Let me know what I am missing. Thanks in advance.
From memory, hopefully it's correct:
It looks like you have set up the installation script to use a virtual environment (it's what the true
is for here)
But you are not calling it from the environment (this should use run_in_venv
(see e.g. autogluon).
The setup_cmd
in your configuration is probably also superfluous.
Hello,
I was able to run the regression benchmark with 33 datasets. I get the following error when trying to upload the results to the website.
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
exec(code, module.__dict__)
File "/mount/src/amlb-streamlit/pages/cd_diagram.py", line 61, in <module>
mean_results = preprocess_data(mean_results)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mount/src/amlb-streamlit/core/data.py", line 55, in preprocess_data
results = impute_results(
^^^^^^^^^^^^^^^
File "/mount/src/amlb-streamlit/core/data.py", line 40, in impute_results
raise ValueError(f"{with_=} is not in `results`")