Eight years of AutoML: categorisation, review and trends

This is Supplementary Material for systematic literature review on AutoML (2023)

Abstract

Knowledge extraction through machine learning techniques has been successfully applied in a large number of application domains. However, apart from the required technical knowledge and background in the application domain, it usually involves a number of time-consuming and repetitive steps. Automated machine learning (AutoML) emerged in 2014 as an attempt to mitigate these issues, making machine learning methods more practicable to both data scientists and domain experts. AutoML is a broad area encompassing a wide range of approaches aimed at addressing a diversity of tasks over the different phases of the knowledge discovery process being automated with specific techniques. To provide a big picture of the whole area, we have conducted a systematic literature review based on a proposed taxonomy that permits categorising 447 primary studies selected from a search of 31,048 papers. This review performs an extensive and rigorous analysis of the AutoML field, scrutinising how the primary studies have addressed the dimensions of the taxonomy, and identifying any gaps that remain unexplored as well as potential future trends. The analysis of these studies has yielded some intriguing findings. For instance, we have observed a significant growth in the number of publications since 2018. Additionally, it is noteworthy that the algorithm selection problem has gradually been superseded by the challenge of workflow composition, which automates more than one phase of the knowledge discovery process simultaneously. Of all the tasks in AutoML, the growth of neural architecture search is particularly noticeable.

Supplementary material

The following document describes in detail the systematic review protocol defined for this paper. A relevant part of this process refers to the adaptation of search strings to each digital library and citation database.

Review protocol

The following file compiles the raw outcomes of applying the aforementioned search strings, as indicated in the review protocol.

PDF file (162 KB)

Papers returned from queries to databases

From the list of papers returned by the search engines, a set of inclusion and exclusion criteria are applied to this list to obtain the list of primary studies. Then, variants of a primary study are identified for only reviewing the most complete manuscripts. Finally, a snow-balling procedure is conducted to retrieve any reference that could be disregarded during the literature search. The following spreadsheet compiles the available papers during each step of the literature search and selection strategy.

ZIP file (9,6 MB)

Papers returned in each step of the literature search and selection strategy

The following document compiles and summarises the results collected from the data extraction forms.

Excel file (989 KB)

Results of the data extraction forms

Information extracted from analysing primary studies.

Excel file (152 KB)

jrromero/automl2022