/distil-auto-ml

Distil Automated Machine Learning Server

Primary LanguagePythonApache License 2.0Apache-2.0

Distil Auto ML

Distil Auto ML is an AutoML system that integrates with D3M

More specifically it is the TA2 system from Uncharted and Qntfy

Main repo is https://github.com/uncharted-distil/distil-auto-ml

Quickstart using Docker

The TA2 system can be built and started via docker-compose however several static files must be downloaded before hand.

Datasets to train on. These may be user created or many examples can be downloaded from https://datasets.datadrivendiscovery.org/d3m/datasets

To train only using the TA2 user generated datasets must be formatted in the same way as the public datasets

Static Files may be pretrained weights of a neural network model, or a simple dictionary mapping tokens to necessary ids. Pretty much anything extra needed to run a ML model within the pipelines.

To bulk download all static files within the D3M universe WARNING this may be quite large

docker-compose run distil bash 
# cd /static && python3 -m d3m index download

One can also pick and choose which static files they wish to download via

python3 -m d3m primitive download -p d3m.primitives.path.of.Primitive -o /static

For more info on how static files integrate within D3M: https://datadrivendiscovery.org/v2020.11.3/tutorial.html#advanced-primitive-with-static-files

Once the static files and the dataset(s) you want to run on are downloaded

# symlink your datasets directory 
ln -s ../datasets/seed_datasets_current seed_datasets_current`

# choose the dataset you want to run 
export DATASET=185_baseball

# run it
docker-compose up distil

There are two testing TA3 systems also available via docker-compose:

# run the dummy-ta3 test suite
docker-compose up distil dummy-ta3

# run the simple-ta3 system, which will then be available in the browser at localhost:80
# this requires a directory named 'output' to exist, in addition to the seed_datasets_current directory
docker-compose up distil envoy simple-ta3

Development

Running From Source

Requirements:

  1. Python 3.6
  2. Pip (Python 3.6 should come with it)
  3. virtualvenv

Instructions on setting up to run from source:

  • Clone distil-auto-ml
git clone https://github.com/uncharted-distil/distil-auto-ml
  • Install libraries on Linux
sudo apt-get install snappy-dev build-essential libopenblas-dev libcap-dev ffmpeg
  • Install libraries on MacOS
brew install snappy cmake openblas libpcap ffmpeg
  • Clone common-primitives
 git clone https://gitlab.com/datadrivendiscovery/common-primitives.git
  • Clone d3m-primitives
 git clone https://github.com/cdbethune/d3m-primitives
  • Clone d3m
 git clone https://gitlab.com/datadrivendiscovery/d3m
  • Clone distil-primitives
 git clone https://github.com/uncharted-distil/distil-primitives
  • Clone distil-primitives-contrib
 git clone https://github.com/uncharted-distil/distil-primitives-contrib
  • Change into the distil-auto-ml directory
 cd distil-auto-ml
  • To avoid package collision it is recommended to create a virtual environment
  • If virtualenv is not installed. Install virtualenv now.
 python3 -m pip install virtualenv
  • Create the environment
 python3 -m virtualenv env
  • Activate the environment
 source env/bin/activate
  • Installing through server-requirements.txt Linux
pip install -r server-requirements.txt
  • Installing through server-requirements.txt MacOS
CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install -r server-requirements.txt
  • Install all the other repository dependencies IMPORTANT: if running on the CPU replace [gpu] with [cpu]
 cd ..
 cd d3m
 pip install -e .\[gpu\]
 cd ..
 cd common-primitives
 pip install -e .\[gpu\]
 cd ..
 cd distil-primitives
 pip install -e .\[gpu\]
 cd ..
 cd d3m-primitives
 pip install -e .\[gpu\]
 cd ..
 cd distil-primitives-contrib
 pip install -e .\[gpu\]
 pip install python-lzo hyppo==0.1.3 mxnet
 pip install -e git+https://github.com/NewKnowledge/simon-d3m-wrapper.git#egg=SimonD3MWrapper
 pip install -e git+https://gitlab.com/datadrivendiscovery/sklearn-wrap.git@dist#egg=sklearn_wrap
 pip install -e git+https://github.com/usc-isi-i2/dsbox-primitives#egg=dsbox-primitives
 pip install -e git+https://github.com/neurodata/primitives-interfaces#egg=jhu-primitives
  # if error with enum and IntFlag try pip uninstall -y enum34
  • MongoDB

Distil AutoML uses MongoDB as a backend store for it's internal hyperparameter tuning There are good instructions depending on your os from the official MongoDB Docs: https://docs.mongodb.com/manual/installation/

  • Distil-auto-ml is ready for use
 ./run.sh
  • generate pipelines
 mkdir pipelines
 python3 export_pipelines.sh
  • Use D3M CLI to interface with distil-auto-ml

Running D3M CLI Example

This section assumes the source has been successfully installed and the datasets have been downloaded. Launch d3m with the following arguments.

python3 d3m runtime -v {location/to/static_resources} -d {location/to/datasets/seed_datasets_current} fit-score 
-r {..seed_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA_problem/problemDoc.json}
-i {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TRAIN/dataset_TRAIN/datasetDoc.json}
-t {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TEST/dataset_TEST/datasetDoc.json}
-a {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/SCORE/dataset_SCORE/datasetDoc.json}
-p {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573.json}
-O {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573_run.yaml}

Building the Docker Container

CPU:

Building a docker image with CPU support is accomplished by invoking the docker_build.sh script:

MacOS/Linux

sudo ./docker_build.sh

Windows

Run command prompt as administrator.

./docker_build.sh

GPU:

Building a docker image with GPU support is accomplished by adding the -g flag to the docker_build.sh call:

MacOS/Linux

sudo ./docker_build.sh -g

Windows

Run command prompt as administrator.

./docker_build.sh -g

Troubleshooting Docker Image Failing to Build:

In the event that building the docker image fails and all of the above criteria has been met. One can invoke the docker_build.sh script again this time adding the -f flag. The -f flag forces the download and reinstall of all dependencies regardless of if they meet criteria. Note: if one is building for GPU support - remember the additional -g flag.

MacOS/Linux

sudo ./docker_build.sh -f

Windows

Run command prompt as administrator.

./docker_build.sh -f