Strata Data Conference San Jose, Tuesday, March 16, 2020
©Copyright 2020, Lightbend, Inc. Apache 2.0 License. Please use as you see fit, at your own risk, but attribution is requested.
NOTE: This code has been built and tested only with the following tools:
- Java 8 and 11 (but see note in the Apache Atlas section below)
- Scala 2.12.10 (this is handled internally by the build process; no need to install anything)
- Python 3.7 (although newer versions may work)
- Docker (recommended for the Apache Atlas example)
Any other versions of Java will not work. Other versions of Scala 2.11 and 2.12 may work, but not scala 2.13 at this time. To build Atlas as described below, you will also need Python 2 available to run the Atlas administration scripts, which are somewhat old and do not work with Python 3. However, we provide a Docker image to use and recommend it instead.
See the companion presentation for the tutorial in the presentation
folder
The tutorial contains 3 exercises:
- Serving models as data (TensorFlow graph) leveraging Cloudflow.
- Using MLflow to capture and view model training metadata.
- Creating a model registry using Apache Atlas.
Several of the examples use components built with Scala. To build and run these examples, install the build tool sbt
. The rest of the dependencies will be downloaded automatically.
The MLflow example uses Python and Pip to install the components. Python 3 is required.
Python 3 is not the default version installed on Mac OS. You have two options:
- Install Python 3 using the Homebrew package manager,
brew install python
. In this case, when you see thepip
command below, usepip3
instead. - Install Anaconda and create a environment for this tutorial.
While it can be a little more work to set up, we recommend using Anaconda for MacOS, Windows, and Linux, especially if you plan to do a lot of work with Python-based tools. By providing isolated environments, it helps avoid conflicting dependency problems, provide isolation when you need to switch between multiple versions of packages (e.g., for testing), and other benefits.
Once Python 3 and pip
(or pip3
) are installed, run the following command to install MLflow and dependencies:
pip install -r MLflow/requirements.txt --upgrade
This tutorial is an sbt
project that's used for two of the three examples:
- Model serving with cloudflow
- A model registry with Apache Atlas
Both use the sbt
build to compile and run the supplied application code. So, here is a "crash course" on interactive sessions with sbt
. Here $
is the shell prompt for bash
, Windows CMD, or whatever (don't type it) and sbt:ML Learning tutorial>
is the interactive prompt for sbt
:
$ sbt
... initialization messages ...
sbt:ML Learning tutorial> projects
...
[info] atlasclient
[info] * ml-metadata-tutorial
[info] tensorflowakka
sbt:ML Learning tutorial>
The *
indicates we are currently using the top-level project for the tutorial. The atlasclient
is a program for interacting with an Apache Atlas server and TensorFlowAkka
uses Cloudflow's Akka API to demonstrate serving models in a microservice-like context, as we explain in the presentation slides.
To work with one of the projects, for example tensorflowakka
(the first one we'll try), use the project
command, as follows. Note that the prompt will change:
sbt:ML Learning tutorial> project tensorflowakka
sbt:TensorFlow-akka>
When we tell you to use some variation of a run
command, it will automatically download all required libraries and build the code first. You could do the compilation step separately, if you like: compile
. Similarly, you can compile the code and compile and run the tests using test
.
Pro Tip: If you are editing code and you want
sbt
to continually compile it every time you save a file, put a~
in front:~compile
or~test
.
This example uses the Akka Streams API in Cloudflow. We won't explain a lot of details about how Cloudflow works, see the Cloudflow documentation for an introduction and detailed explanations of concepts.
Clouflow applications are designed to be tested locally and executed in a cluster for production. For this exercise, we will not install the serving example to a cluster, but run it locally, using sbt
.
Start the sbt
interpreter and use the following sbt
commands from the project root directory:
sbt:ML Learning tutorial> project tensorflowakka
sbt:tensorflow-akka> runLocal
This will print out the location of the log file. (Look for the ... --- Output --- ...
line.) On MacOS or Linux systems, use the following command to see the entries as they are written:
tail -f <log_location>
On Windows, use the command more < <log_location>
, but it stops as soon as it has read the current last line, so you'll need to repeat the command several times as new output is written to the file.
Terminate the example by pressing the Enter key in the sbt
window.
Pro Tips:
- In some MacOS and Linux shells, you can "command-click", "control-click", or right click on the file path in the text output to open it in a console window for easy browsing.
- Actually, you don't need to switch to
project tensorflowakka
before invokingrunLocal
. You could just invokerunLocal
in the top-levelml-metadata-tutorial
project. However, we switched totensorflowakka
first so it's clear which one we're actually using.
We'll train some models using scikit-learn and track those training runs in MLflow. We'll use the MLflow GUI to examine the data.
Follow the instructions above to install MLflow and other dependencies for this example using pip
. For reference, the MLflow quick start provides additional information.
Additionally, this tutorial project contains a Dockerfile and supporting scripts to build an MLflow docker image. There is a Helm chart for installing the image in a Kubernetes cluster, leveraging Minio for storing tracking results.
For this tutorial, we will use local installation.
We provide both notebook and Python versions of the code implementing training and storing execution metadata.
To run the notebook version, install Jupyter using pip
(or pip3
):
pip install jupyter
Then change to the MLflow
directory and run:
jupyter notebook
Click on the MLflow.ipynb
link to open the notebook.
The same code is in the Python script, MLflow.py
in the MLflow/example
directory. You can run it with the command:
python MLflow.py
Note: If you get an exception about a "key error" for
metrics.rmse
on linedf_runs.sort_values(["metrics.rmse"], ascending = True, inplace = True)
. It may be that the experiment number actually used was 1 instead of 0. Changedf_runs = mlflow.search_runs(experiment_ids="0")
to use 1 instead of 0 and try again.
By default, wherever you run your program, the tracking API writes data into files in a local ./mlruns directory. You can then run MLflow’s Tracking UI to view them:
mlflow ui
View results at
http://localhost:5000
For Atlas you can either build and run Apache Atlas locally or use a prebuilt Docker image lightbend/atlas:0.0.1
. The latter is much is easier, as we'll see, but it requires you to have Docker installed on your laptop.
If you want to install Atlas locally, try the following at home! It will take too long to do this during the tutorial and it will consume what battery reserve you have left.
This bash file downloads, builds, runs Atlas to confirm it's working, then shuts it down.
The following tools are required for this script to work:
- Java 8 (
JAVA_HOME
must be defined) - Newer versions of Java will not work, because of annotations that were removed from the JDK, which Atlas uses. - Maven - the
mvn
command. wget
orcurl
- However, if both are missing, the script tells you what to do as a workaround.- Python 2 - While you need Python 3 for the rest of the tutorial, the admin scripts for Atlas are old and require Python 2. If you are using Anaconda for your Python installation, you can create a separate Python 2-based "environment" just for running Atlas. See troubleshooting below for more details.
It's best to change to the localinstall
directory and then run ./install.sh
.
Toubleshooting:
- Verify you are using Java 8; annotations used by the Atlas code were apparently deprecated and removed in either later versions of the JDK or some library dependency that is JDK version-specific.
- If the Maven build fails with an error that an
slf4j
dependency can't be resolved, look closely at the error message and see if it complains that accessing the repohttp://repo1.maven.org/maven2
requires HTTPS. If so, edit line 793 in the downloadedpom.xml
file for Atlas, changehttp
tohttps
.- If Atlas builds, but you get an error running the
atlas_start.py
script, it's probably because the Python 3 installation you're using for the rest of this tutorial is not compatible with the script, which expects Python 2 :(
If you encounter the last issue, you may also have Python 2 on your laptop. For MacOS, the built-in version of Python, /usr/bin/python
, is version 2.7.X. If you are on Windows or Linux, your machine may also have Python 2 installed somewhere.
So, MacOS users can run the following commands:
cd apache-atlas-sources-2.0.0
cd distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/
/usr/bin/python bin/atlas_start.py
NOTE: If you get this far, be patient, as it takes a while for Atlas to start up. You'll see dots printed while it's initializing. It's ready when you see Apache Atlas Server started!!!
As you can see, it's not easy to reliably build and run Atlas. Hence, we strongly recommend using the Docker image for the tutorial. For reference, the Docker image was built using this Dockerfile. In case you want to install it to a Kubernetes cluster, you can use this Helm chart.
To run the Docker image, cd to the Alas
directory and run one of the following scripts, run-docker.sh
or run-docker.bat
. In both cases, they run the following command:
docker run --rm -it \
-v ./atlas-logs:/opt/apache-atlas-2.0.0/logs \
-v ./atlas-conf:/opt/apache-atlas-2.0.0/conf \
-p 21000:21000 \
--name atlas \
lightbend/atlas:0.0.1
This command will run the container using the lightbend/atlas:0.0.1
Docker image. It will automatically remove the container when it exits (--rm
), run it in interactive, TTY mode (-i -t
), so you can see what's written to the internal console, mount both the logs
and conf
directories to local directories so you can view them easily, tunnel port 21000
, and name the container atlas
.
NOTE: Be patient, as it takes a while for the container to finish start up. You'll see dots printed while it's initializing. It's ready when you see Apache Atlas Server started!!!
An example of using Atlas is located in ./AtlasClient. You can build it either by loading this directory as a Scala project in your IDE or using this SBT commands from the project root directory:
sbt:ML Learning tutorial> project atlasclient
sbt clean compile
Once the project is built, there are two applications you can run:
ConnectivityTester
, to check if you can connect to the cluster correctly and get a sense of how definitions look in Atlas.ModelCreator
, to create the required types and populate a simple model information.
To run either of them, you have two options.
- In your IDE, navigate to
com.lightbend.atlas.utils.ConnectivityTester
orcom.lightbend.atlas.model.ModelCreator
, then right click and use the Run command. - Use one of several
run
commands insbt
.
For sbt
, the easiest way is to invoke the run
command and then enter the number at the prompt:
sbt:atlasclient> run
...
Multiple main classes detected, select one to run:
[1] com.lightbend.atlas.model.ModelCreator
[2] com.lightbend.atlas.utils.ConnectivityTester
You can also avoid the prompt by invoking each one directly:
sbt:atlasclient> runMain com.lightbend.atlas.model.ModelCreator
sbt:atlasclient> runMain com.lightbend.atlas.utils.ConnectivityTester
Point your browser to the following URL to see the Atlas UI:
http://localhost:21000
Use the credentials admin/admin
.