openml/automlbenchmark

Example "Stacking" is not working

sedol1339 opened this issue · 2 comments

Hello! I'm trying to run the example model examples/custom/extensions/Stacking. As proposed by @PGijsbers , in setup.py I do the replacement:

+ . "$HERE/.setup/setup_env"
- . "$HERE/.setup_env"

Then I'm running the following command:

python runbenchmark.py Stacking validation 30m4c2f \
    --indir /data/osedukhin/shared/openml_cache \
    --outdir /data/osedukhin/shared/amlb_results \
    --userdir examples/custom \
    --exit-on-error -s force

This gives the following error:

AttributeError: module 'numpy' has no attribute 'float'.

Assuming some problems with numpy versions, and do the following replacement in requirements.txt:

+ scikit-learn
- scikit-learn==0.22.1

Then I remove examples/custom/extensions/Stacking/venv folder and re-run the command. After this, the above error is gone, however I get a new error:

Setup of framework Stacking completed successfully.
[MONITORING] [python [1469536]] CPU Utilization: 1.8%

-------------------------------------------------------------
Starting job local.validation.30m4c2f.bioresponse.0.Stacking.
[MONITORING] [python [1469536]] Memory Usage: 5.1%
Assigning 4 cores (total=32) for new task bioresponse.
[MONITORING] [python [1469536]] Disk Usage: 80.2%
Assigning 120230 MB (total=128826 MB) for new bioresponse task.
Running task bioresponse on framework Stacking with config:
TaskConfig({'framework': 'Stacking', 'framework_params': {'_rf_params': {'n_estimators': 200}, '_gbm_params': {'n_estimators': 200}, '_linear_params': {'penalty': 'elasticnet', 'loss': 'log'}, '_final_params': {'max_iter': 1000}}, 'framework_version': '0.22.1', 'type': 'classification', 'name': 'bioresponse', 'openml_task_id': 9910, 'test_server': False, 'fold': 0, 'metric': 'auc', 'metrics': ['auc', 'logloss', 'acc', 'balacc'], 'seed': 951992630, 'job_timeout_seconds': 3600, 'max_runtime_seconds': 1800, 'cores': 4, 'max_mem_size_mb': 120230, 'min_vol_size_mb': -1, 'input_dir': '/data/osedukhin/shared/openml_cache', 'output_dir': '/data/osedukhin/shared/amlb_results/stacking.validation.30m4c2f.local.20231005T150441', 'output_predictions_file': '/data/osedukhin/shared/amlb_results/stacking.validation.30m4c2f.local.20231005T150441/predictions/bioresponse/0/predictions.csv', 'tag': None, 'command': 'runbenchmark.py Stacking validation 30m4c2f --indir /data/osedukhin/shared/openml_cache --outdir /data/osedukhin/shared/amlb_results --userdir examples/custom --exit-on-error -s force', 'git_info': {'repo': 'https://github.com/openml/automlbenchmark', 'branch': 'master', 'commit': '386cfb66baa576ca9891ca18007c8d298380da3e', 'tags': [], 'status': ['## master...origin/master', ' M examples/custom/extensions/Stacking/requirements.txt', ' M examples/custom/extensions/Stacking/setup.sh', ' M frameworks/shared/setup.sh']}, 'measure_inference_time': False, 'ext': {}, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'type_': 'binary', 'output_metadata_file': '/data/osedukhin/shared/amlb_results/stacking.validation.30m4c2f.local.20231005T150441/predictions/bioresponse/0/metadata.json'})
Running cmd `/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/venv/bin/python -W ignore /data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py`
Traceback (most recent call last):

  File "/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py", line 17, in <module>

    from frameworks.shared.callee import call_run, result

  File "/data/osedukhin/shared/automlbenchmark/frameworks/shared/callee.py", line 107, in <module>

    def measure_inference_times(predict_fn: Callable[[DATA_INPUT], Any], files: list[Tuple[int, DATA_INPUT]]) -> dict[int, list[float]]:

TypeError: 'type' object is not subscriptable



Job `local.validation.30m4c2f.bioresponse.0.Stacking` failed with error: Command '/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/venv/bin/python -W ignore /data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 120, in start
    result = self._run()
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 744, in profiler
    return fn(*args, **kwargs)
  File "/data/osedukhin/shared/automlbenchmark/amlb/benchmark.py", line 578, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/__init__.py", line 26, in run
    return run_in_venv(__file__, "exec.py",
  File "/data/osedukhin/shared/automlbenchmark/frameworks/shared/caller.py", line 134, in run_in_venv
    output, err = run_cmd(cmd, *args,
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 281, in run_cmd
    raise e
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 255, in run_cmd
    completed = run_subprocess(str_cmd if params.shell else full_cmd,
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 98, in run_subprocess
    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/venv/bin/python -W ignore /data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py' returned non-zero exit status 1.
Job `local.validation.30m4c2f.bioresponse.0.Stacking` did not stop gracefully: Job `local.validation.30m4c2f.bioresponse.0.Stacking` was interrupted.
Traceback (most recent call last):
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 226, in start
    self._run()
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 324, in _run
    result = job.start()
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 120, in start
    result = self._run()
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 744, in profiler
    return fn(*args, **kwargs)
  File "/data/osedukhin/shared/automlbenchmark/amlb/benchmark.py", line 578, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/__init__.py", line 26, in run
    return run_in_venv(__file__, "exec.py",
  File "/data/osedukhin/shared/automlbenchmark/frameworks/shared/caller.py", line 134, in run_in_venv
    output, err = run_cmd(cmd, *args,
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 281, in run_cmd
    raise e
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 255, in run_cmd
    completed = run_subprocess(str_cmd if params.shell else full_cmd,
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 98, in run_subprocess
    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/venv/bin/python -W ignore /data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 147, in stop
    self._cancel()
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 184, in _cancel
    raise_in_thread(self.thread_id, CancelledError(f"Job `{self.name}` was interrupted."))
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 437, in raise_in_thread
    ret = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, exc_class)
amlb.utils.process.CancelledError: Job `local.validation.30m4c2f.bioresponse.0.Stacking` was interrupted.
All jobs executed in 1.877 seconds.
[MONITORING] [python [1469536]] CPU Utilization: 3.9%
[MONITORING] [python [1469536]] Memory Usage: 5.2%
[MONITORING] [python [1469536]] Disk Usage: 80.2%
Command '/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/venv/bin/python -W ignore /data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py' returned non-zero exit status 1.
Traceback (most recent call last):
  File "runbenchmark.py", line 189, in <module>
    res = bench.run(args.task, args.fold)
  File "/data/osedukhin/shared/automlbenchmark/amlb/benchmark.py", line 211, in run
    results = self._run_jobs(jobs)
  File "/data/osedukhin/shared/automlbenchmark/amlb/benchmark.py", line 253, in _run_jobs
    self.job_runner.start()
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 226, in start
    self._run()
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 324, in _run
    result = job.start()
  File "/data/osedukhin/shared/automlbenchmark/amlb/job.py", line 120, in start
    result = self._run()
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 744, in profiler
    return fn(*args, **kwargs)
  File "/data/osedukhin/shared/automlbenchmark/amlb/benchmark.py", line 578, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/__init__.py", line 26, in run
    return run_in_venv(__file__, "exec.py",
  File "/data/osedukhin/shared/automlbenchmark/frameworks/shared/caller.py", line 134, in run_in_venv
    output, err = run_cmd(cmd, *args,
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 281, in run_cmd
    raise e
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 255, in run_cmd
    completed = run_subprocess(str_cmd if params.shell else full_cmd,
  File "/data/osedukhin/shared/automlbenchmark/amlb/utils/process.py", line 98, in run_subprocess
    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '/data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/venv/bin/python -W ignore /data/osedukhin/shared/automlbenchmark/examples/custom/extensions/Stacking/exec.py' returned non-zero exit status 1.

Just to mention, I received the same error while trying to run my custom framework CatBoost. How to fix this?

Which Python version are you using? You should be using 3.9 (higher might also work, but it's not officially supported). The second error looks like a problem with the type hints because of the Python version (using list instead of List is a 3.9+ feature).

AttributeError: module 'numpy' has no attribute 'float'.

numpy.float is not directly accessed in either __init__.py or exec.py, so I think it is indeed an issue with installed dependencies. It is possible that there is a mismatch in dependency versions due to the used Python version. I'll try to see if I can run the Stacking example tomorrow.

Got it, i just need to update to python 3.9, thank you, now these errors are gone