/FEDOT-benchmarks

Comparison tool for the state-of-the-art AutoML frameworks

Primary LanguagePython

AutoML benchmark for FEDOT framework - [OBSOLETE, see AMLB and pytsbe for actual examples]

This tool will help you to execute different AutoMl frameworks with problem data you want. The repository already has some cases (i.e. credit_scoring), the ability to work with PMLB datasets and open to new experiments.

How to

Execute existing cases

All the existing cases are located in test_cases directory. To execute an experiment open the directory with the case and run the script case_name.py inside.

The main part presents the CaseExecutor with the params, models and metrics to run.

result_metrics = CaseExecutor(params=ExecutionParams(train_file=train_file,
                                                     test_file=test_file,
                                                     task=TaskTypesEnum.classification,
                                                     target_name='default',
                                                     case_label='scoring'),
                              models=[BenchmarkModelTypesEnum.baseline,
                                      BenchmarkModelTypesEnum.tpot,
                                      BenchmarkModelTypesEnum.fedot],
                              metric_list=['roc_auc', 'f1']).execute()

To understand which hyperparameters were used for AutoML models have a look at the realisation of the get_models_hyperparameters function to see or tailor the requirement parameters.

result_metrics['hyperparameters'] = get_models_hyperparameters()

The following function saves the result of the execution to json file next to the case script.

save_metrics_result_file(result_metrics, file_name='scoring_metrics')

Add custom experiment

To build an experiment create a directory with the name of your case in test_cases directory. Create a directory named data inside to put your data files here and a script named as your case and fill it in as follows:

Note! Do not forget to replace all the your_case phrases in names to the name of your case

from benchmark_model_types import BenchmarkModelTypesEnum
from executor import CaseExecutor, ExecutionParams
from core.repository.tasks import TaskTypesEnum
from benchmark_utils import (get_models_hyperparameters,
                             save_metrics_result_file,
                             get_your_case_data_paths,
                             )

if __name__ == '__main__':
    train_file, test_file = get_your_case_data_paths()

    result_metrics = CaseExecutor(params=ExecutionParams(train_file=train_file,
                                                         test_file=test_file,
                                                         task=TaskTypesEnum.classification,
                                                         target_name='default',
                                                         case_label='your_case'),
                                  models=[BenchmarkModelTypesEnum.baseline,
                                          BenchmarkModelTypesEnum.tpot,
                                          BenchmarkModelTypesEnum.fedot],
                                  metric_list=['roc_auc', 'f1']).execute()

     result_metrics['hyperparameters'] = get_models_hyperparameters()

     save_metrics_result_file(result_metrics, file_name='your_case_metrics')

To import your data properly make a corresponding function for your case in benchmark_utils script:

def get_your_case_data_paths() -> Tuple[str, str]:
    train_file_path = os.path.join('test_cases', 'your_directory', 'data', 'your_case_name_train.csv')
    test_file_path = os.path.join('test_cases', 'your_directory', 'data', 'your_case_name_test.csv')
    full_train_file_path = os.path.join(str(project_root()), train_file_path)
    full_test_file_path = os.path.join(str(project_root()), test_file_path)

    return full_train_file_path, full_test_file_path

Pay attention to the task and model types and target_name(the target column name). All the supported task types and model types are available in the TaskTypesEnum and BenchmarkModelTypesEnum objects respectively.