AMS is a tool to automatically generate AutoML search spaces from users' weak specifications. A weak specification is defined a set of API classes to include in the AutoML search space. AMS then extends this set with complementary classes, functionally-related classes, and relevant hyperparameters and possible values. This configuration can then be paired with existing search techniques to generate ML pipelines. AMS relies on API documentation and corpus of code examples to strengthen the input weak spec.
You can download a VM (if you are not already using it) from https://ams-fse.s3.us-east-2.amazonaws.com/ams.ova .
If prompted, the username is ams
and the password is ams
.
The DOI for the artifact submitted for artifact evaluation was
.
This artifact has since been superseded by a version that incorporates changes based on the camera ready. Please use that version of the VM (which you can download from zenodo or through aws as explained below.)
The DOI for the camera-ready version of the artifact is
.
$ wget https://ams-fse.s3.us-east-2.amazonaws.com/ams.ova
The should result in a .ova
(format version 1.0) that can be imported into
virtualbox or vmware. This image was exported and tested with
Virtualbox version 5.2.
If you do so, you can skip all steps below relating to building and simply
navigate to the ams
folder and activate the conda environment
$ cd ~/ams/
$ conda activate ams-env
We have also included a Dockerfile
that installs conda and sets up the
ams
dependencies up for you. You may find that easier (more convenient to
use) than your base machine. If so, you can run
docker build . -t ams-container --memory=8g
to build the container. Then you can start it with
docker run -it --memory=8g ams-container
You may want to increase the memory allotted for the docker run
command as you see fit
(and you may be able to decrease it for the docker build
command). You may
also find this post useful https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container
After you have started the docker container by executing the run
command,
you can jump to running
scripts/folder_setup.sh
(please see the installation section below for more
details).
AMS should run without issues on Ubuntu 18.04 and Mac OSX (tested on 10.11.6). If you have issues running, we suggest using the pre-packaged VM (or feel free to contribute back fixes that allow AMS to build on your platform).
If you want to install from source, you will need the following basic utilities
(installable using apt-get/brew
):
wget
(e.g.apt-get install wget
)zip
(e.g.apt-get install zip
)
If you are not using Ubuntu or Mac OSX, you should also manually install
task-spooler
(https://vicerveza.homeunix.net/~viric/soft/ts/) and make sure
we can call it using tsp
(or set a corresponding alias).
You will then want to remove the task-spooler
install in scripts/setup.sh
.
All other software packages needed are either 1) installed by our scripts
automatically (which should work without issues for Ubuntu and Mac OSX)
or 2) provide a prompt for you to manually install (as in the
case of conda
). Indeed, the pre-packaged VM
was configured using a clean Ubuntu image and running
the instructions for installing from source.
AMS should run without issues on Ubuntu and Mac OSX (tested on 10.11.6). If you have issues running, we suggest using the pre-packaged VM.
First, you should install conda
. If you don't have conda
already,
please install from
https://docs.conda.io/en/latest/miniconda.html
Once you have done so, you can build the conda environment
$ conda env create -f environment.yml
This creates the conda environment ams-env
.
All scripts/commands should be executed from the root ams
directory
and with the ams-env
environment active (i.e. run conda activate ams-env
when using AMS).
If you would like to modify the location where data/resources etc are
saved down, you should edit scripts/folder_setup.sh
accordingly.
These paths are used/referenced throughout the remainder of the setup.
Note that the data folder ($DATA
) should point to data/
directory
in the root of the project. You can change this, but we do not recommend it.
You should then install some additional supporting resources by running
$ bash scripts/setup.sh
This may take some amount of time as it has to download/build multiple third-party tools.
You can verify that your installation has completed successfully by running
$ python -m core.generate_search_space --help
You should see a help message printed to the console.
To use AMS, AMS first extracts rules for complementary components, indexes the API documents, and extracts hyperparameter/value frequency distributions from the code corpus. To build this, just execute
$ bash scripts/build_ams.sh
This may take some time as AMS traverse the API's module hierarchy
and extracts information from the code corpus. You will see messages such as
Trying <...>
or Failed <...>
. These can be safely ignored.
Once you are done with this, you are ready to use AMS and shouldn't need to tweak anything else.
Loading the SciSpacy language model takes approximately 30 seconds.
spacy_nlp = spacy.load("en_core_sci_lg")
This is a known hurdle for language models in Spacy (see explosion/spaCy#2679 for a similar example)
This latency is a cost to starting up AMS on each new weak specification. A simple workaround (that has not yet been implemented), is to run AMS as a server application, with new weak specifications provided as input from a client. We eschew this for the current artifact as it may complicate use of AMS for reviewers.
If you would like to use AMS to generate search spaces for your weak specs,
you can use the script scripts/use_ams.sh
.
For example,
$ (bash scripts/use_ams.sh sklearn.linear_model.LogisticRegression sklearn.preprocessing.MinMaxScaler) > config.json
produces a strengthened search space for this weak specification.
$ cat config.json | jq # note we don't install jq
{
"sklearn.linear_model.SGDClassifier": {
"loss": [
"log",
"hinge"
],
"penalty": [
"l2",
"elasticnet",
"l1"
],
"alpha": [
1e-05,
0.0001
]
},
"sklearn.linear_model.LogisticRegression": {
"C": [
100000,
7,
100,
1
],
"penalty": [
"l1",
"l2"
],
"class_weight": [
"auto",
"balanced",
null
]
},
"sklearn.linear_model.RidgeClassifier": {
"solver": [
"sag",
"auto"
],
"tol": [
0.01,
0.001
]
},
"sklearn.preprocessing.StandardScaler": {
"copy": [
true,
false
],
"with_mean": [
true,
false
],
"with_std": [
true
]
},
"sklearn.linear_model.Log": {},
"sklearn.preprocessing.MinMaxScaler": {
"copy": [
true
]
},
"sklearn.preprocessing.RobustScaler": {},
"sklearn.preprocessing.MaxAbsScaler": {},
"sklearn.preprocessing.Binarizer": {}
}
The script hardcodes various AMS choices, which you can modify as desired. In particular, the script sets:
NUM_COMPONENTS=4
NUM_ASSOC_RULES=1
ALPHA_ASSOC_RULES=0.5
NUM_PARAMS=3
NUM_PARAM_VALUES=3
NUM_COMPONENTS
refers to the number of functionally related components
to add (at most) per component in the weak spec. NUM_ASSOC_RULES
refers
to the number of complementary components to add (at most) per component
in the weak spec. ALPHA_ASSOC_RULES
combines a rule's normalized-PMI
and support fraction to obtain a single score for an association rule,
we used 0.5 in our evaluations but you can modify if you'd like. Please
see the paper for details. NUM_PARAMS
is the (max) count of hyperparameters to
include in the search space for each component in the extended specification,
and NUM_PARAM_VALUES
is the (max) number of possible values (in addition
to the default value) for each hyperparameter in the extended search space.
AMS produces a search space in JSON format. This configuration can be read in and used directly with TPOT, or with the random search procedure used for AMS evaluation.
For example, we first generate the search space from the prior example and dump it into a text file
$ (bash scripts/use_ams.sh sklearn.linear_model.LogisticRegression sklearn.preprocessing.MinMaxScaler) > config.json
We then launch the python interpreter, read in the configuration, and show how it can be used on a generated dataset
$ python
import json
import tpot
import sklearn.datasets
from core.search import RandomSearch
X, y = sklearn.datasets.make_classification(100, 10)
config = json.load(open("config.json", "r"))
# GP-based search
clf_tpot = tpot.TPOTClassifier(max_time_mins=1, config_dict=config, verbosity=3)
# Random search
clf_rand = RandomSearch(max_time_mins=1, max_depth=3, config_dict=config)
clf_tpot.fit(X, y)
clf_rand.fit(X, y)
You can reproduce FSE experiments and figures by using scripts in
scripts/fse/
. As others, these should be run from the root AMS directory.
Given that some of the experiments explained below take on the order of days
to run on a well-provisioned machine, we provide a download of our experimental
results. (If you are using the artifact VM, these results have already been
loaded and you can skip the following step).
Note that we have included all necessary datasets directly in the repository (and artifact), as the library that packages these datasets has implemented breaking changes with no backwards compatibility.
If you want to download the results, you can run
$ bash scripts/fse/download_results.sh
This will download results from an AWS S3 bucket and will place results in
$RESULTS
, $ANALYSIS_DIR
, and $DATA
. In particular,
${RESULTS}
will now contain folders of the form q[0-9]+
, one for each of
the 15 weak specifications in our experiments. In it, you will find the
weak specification (simple_config.json
),
the specification with expert-defined hyperparameters (simple_config_with_params_dict.json
), and the AMS-generated search space
(gen_config.json
).
The folder random
contains results (again organized by weak specification
experiment) when using random search, while tpot
contains results
when using genetic programming (the TPOT tool).
rule-mining/
has experimental results for the complementary component
experiments.
${DATA}/corpus-size*
contains objects derived by AMS from a subsample
of the corpus.
The folder $ANALYSIS_DIR
(analysis_output
in the VM) holds figures/tables produced in analysis
and used in the paper. In particular,
- Table 2:
rules/roles.tex
- Figure 3:
rules/precision.pdf
- Figure 4:
relevance/plot.pdf
- Figure 5:
- (a)
hyperparams/num_params_tuned.pdf
- (b)
hyperparams/distance_params_tuned.pdf
- (c)
hyperparams/num_param_values.pdf
- (a)
- Figure 6:
hyperparams/perf.pdf
- Figure 7:
performance/combined_wins.pdf
- Figure 8:
tpot-sys-ops/combined.pdf
(includes extended examples) - Figure 9:
- (a)
corpus-size/hyperparameters.pdf
- (b)
corpus-size/hyperparameter_values.pdf
- (c)
corpus-size/num_mined_rules.pdf
- (d)
corpus-size/jaccard_mined_rules.pdf
- (a)
(Tip: To open PDFs from the terminal in the Ubuntu VM, you can use xdg-open file.pdf
)
bash scripts/fse/reproduce_complementary_experiments.sh
reproduces experimental results relating
to extraction of complementary components to add to weak specifications. These
experiments should take on the order of a couple of hours to run.
bash scripts/fse/reproduce_functional_related_experiments.sh
generates the data used for
manual annotation of functionally related components. We have already included
our manually annotated results as part of the artifact, so running this script
will prompt you to confirm before overwriting those with (unannotated) data.
bash scripts/fse/reproduce_performance_experiments.sh
generates search space configurations
and evaluates them against our comparison baselines (weak spec, weak spec + search,
and expert + search). Running these experiments from scratch takes on the order
of 1-2 days on a machine with 30 cores. Given this computational burden, we have
also included our results in the artifact.
bash scripts/fse/reproduce_corpus_size.sh
generates subsampled versions of
our code corpus, and rebuilds portions of AMS that rely on code examples
(i.e. hyperparameter mining and complementary component mining). Running
these experiments from scratch takes on the order of 2 hours on a machine
with 30 cores. Given this computational burden, we have also included our
results in the artifact. Note that in contrast to other scripts, this
is "creating new data" and as such the outputs are stored in ${DATA}
,
following the naming convention corpus-size-${corpus_size}-iter-${corpus_iter}/
where ${corpus-size}
is the downsampling ratio (e.g. 0.1) and
${corpus-iter}
is the iteration index (e.g. 1)
as we repeat the downsampling 5 times per ratio.
bash scripts/fse/reproduce_analysis.sh
generates figures from the outputs of the prior
3 scripts and also runs some additional (~ 1 hour execution) experiments to
characterize the hyperparameters found in our code corpus.
The figures/tables are generated and saved to $ANALYSIS_DIR
(set to analysis_output/
, if not modified in folder_setup.sh
).
Please see the prior section for details on figure/table mappings.
We provide a short overview of the AMS codebase:
-
core/
contains the main tool logic:extract_sklearn_api.py
: traverse sklearn modules to find classes to import and represent with embeddings (also has stuff on default parameters)nlp.py
: helper functions to parse/embed natural languagecode_to_api.py
: takes a code specification and maps it to possibly related API components using pre-trained embeddingsextract_kaggle_scripts.py
: filter down meta-kaggle to find useful scripts (i.e. those that import target scikit-learn library)extract_parameters.py
: (light) parse of python scripts from kaggle to extract calls to APIs and their parameterssummarize_parameters.py
: tally up frequent parameter names/values by API componentgenerate_search_space.py
: given weak spec (code/nl) generate search space dictionarysearch.py
: various search strategies and helpers
-
experiments/
contains all code to run experiments:generate_experiment.py
: generate experiment configurations based on some predefined components of interestsimple_pipeline.py
: compile weak spec directly into a sklearn pipeline for benchmarkingrun_experiment.py
: driver to run different search strategies/configurationsbuild_corpus_size_experiment.py
: driver to run corpus size experiments, downsamples corpus and rebuilds portions of AMS that use that data
-
analysis/
: contains code to run analysis on experiment outputs and conduct additional characterization of our data.annotation_rules_component_relevance.md
: details our annotation guidelines for manually assessing functionally related componentsassociation_rules_analysis.py
: evaluates the rules used to extend specifications with complementary componentscombined_wins_plot.py
: is a utility to combine win counts into a single plotdistribution_hyperparameters.py
: characterizes hyperparameters found in our code corpusfrequency_operators.py
: compute distribution of components in pipelinesperformance_analysis.py
: compute table/plots of wins from performance experiments datapipeline_to_tree.py
: convert pipeline from API into a tree (easier to analyze) utilityrelevance_markings.py
: create plot of functionally related components' manual annotation resultscorpus_size_analysis.py
: create plots for impact of corpus sizeutils.py
: misc utils
Feel free to email Jose Cambronero (jcamsan@mit.edu) with questions.