TFX Developer Tutorial

Introduction

This tutorial is designed to introduce TensorFlow Extended (TFX) and help you learn to create your own machine learning pipelines. It runs locally, and shows integration with TFX and TensorBoard as well as interaction with TFX in Jupyter notebooks.

Key Term: A TFX pipeline is a Directed Acyclic Graph, or "DAG". We will often refer to pipelines as DAGs.

You'll follow a typical ML development process, starting by examining the dataset, and end up with a complete working pipeline. Along the way you'll explore ways to debug and update your pipeline, and measure performance.

Learn more

Please see the TFX User Guide to learn more.

Step by step

You'll gradually create your pipeline by working step by step, following a typical ML development process. Here are the steps:

Setup your environment
Bring up initial pipeline skeleton
Dive into your data
Feature engineering
Training
Analyzing model performance
Ready for production

Prerequisites

Linux or MacOS
Virtualenv
Python 2.7
Git

Tutorial Materials

The code for this tutorial is available at: https://github.com/tensorflow/tfx/tree/master/examples/

The code is organized by the steps that you're working on, so for each step you'll have the code you need and instructions on what to do with it.

The tutorial files include both an exercise and the solution to the exercise, in case you get stuck.

Exercise

taxi_pipeline.py
taxi_utils.py
taxi DAG

Solution

taxi_pipeline_solution.py
taxi_utils_solution.py
taxi_solution DAG

What you're doing

You’re learning how to create an ML pipeline using TFX

TFX pipelines are appropriate when datasets are large
TFX pipelines are appropriate when training/serving consistency is important
TFX pipelines are appropriate when version management for inference is important
Google uses TFX pipelines for production ML

You’re following a typical ML development process

Understanding our data
Feature engineering
Training
Analyze model performance
Lather, rinse, repeat
Ready for production

Adding the code for each step

The tutorial is designed so that all the code is included in the files, but all the code for steps 3-7 is commented out and marked with inline comments. The inline comments identify which step the line of code applies to. For example, the code for step 3 is marked with the comment # Step 3.

The code that you will add for each step typically falls into 3 regions of the code:

imports
The DAG configuration
The list returned from the create_pipeline() call
The supporting code in taxi_utils.py

As you go through the tutorial you'll uncomment the lines of code that apply to the tutorial step that you're currently working on. That will add the code for that step, and update your pipeline. As you do that we strongly encourage you to review the code that you're uncommenting.

Chicago Taxi Dataset

You're using the Taxi Trips dataset released by the City of Chicago.

Note: This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.

You can read more about the dataset in Google BigQuery. Explore the full dataset in the BigQuery UI.

Goal - Binary classification

Will the customer tip more or less than 20%?

Step 1: Setup your environment

The setup script (setup_demo.sh) installs TFX and Airflow, and configures Airflow in a way that makes it easy to work with for this tutorial.

In a shell:

virtualenv -p python2.7 tfx-env
source tfx-env/bin/activate
mkdir tfx; cd tfx

pip install tensorflow==1.13.1
pip install tfx==0.12.0
git clone https://github.com/tensorflow/tfx.git
cd tfx/examples/workshop/setup
./setup_demo.sh

You should review setup_demo.sh to see what it's doing.

Step 2: Bring up initial pipeline skeleton

Hello World

In a shell:

# Open a new terminal window, and in that window ...
source tfx-env/bin/activate
airflow webserver -p 8080

# Open another new terminal window, and in that window ...
source tfx-env/bin/activate
airflow scheduler

In a browser:

Open a browser and go to http://127.0.0.1:8080

DAG view buttons

Use the button on the left to enable the taxi DAG
Use the button on the right to trigger the taxi DAG
Click on taxi

Wait for the CsvExampleGen component to turn dark green (~1 minutes)
Use the graph refresh button on the right or refresh the page
Wait for pipeline to complete
- All dark green
- Use refresh button on right side or refresh page

Step 3: Dive into our data

The first task in any data science or ML project is to understand and clean the data.

Understand the data types for each feature
Look for anomalies and missing values
Understand the distributions for each feature

Components

ExampleGen ingests and splits the input dataset.
StatisticsGen calculates statistics for the dataset.
SchemaGen SchemaGen examines the statistics and creates a data schema.
ExampleValidator looks for anomalies and missing values in the dataset.

In an editor:

Uncomment the lines marked Step 3 in both taxi_pipeline.py and taxi_utils.py
Take a moment to review the code that you uncommented

In a browser:

Return to DAGs list page in Airflow
Click the refresh button on the right side for the taxi DAG
- You should see "DAG [taxi] is now fresh as a daisy"
Trigger taxi
Wait for pipeline to complete
- All dark green
- Use refresh on right side or refresh page

Back on Jupyter:

Open step3.ipynb
Follow the notebook

More advanced example

The example presented here is really only meant to get you started. For a more advanced example see the TensorFlow Data Validation Colab

For more information on using TFDV to explore and validate a dataset, see the examples on tensorflow.org

Step 4: Feature engineering

You can increase the predictive quality of our data and/or reduce dimensionality with feature engineering.

Feature crosses
Vocabularies
Embeddings
PCA
Categorical encoding

One of the benefits of using TFX is that you will write your transformation code once, and the resulting transforms will be consistent between training and serving.

Components

Transform performs feature engineering on the dataset.

In an editor:

Uncomment the lines marked Step 4 in both taxi_pipeline.py and taxi_utils.py
Take a moment to review the code that you uncommented

In a browser:

Return to DAGs list page in Airflow
Click the refresh button on the right side for the taxi DAG
- You should see "DAG [taxi] is now fresh as a daisy"
Trigger taxi
Wait for pipeline to complete
- All dark green
- Use refresh on right side or refresh page

Back on Jupyter:

Open step4.ipynb
Follow the notebook

More advanced example

The example presented here is really only meant to get you started. For a more advanced example see the TensorFlow Transform Colab.

Step 5: Training

Train a TensorFlow model with our nice, clean, transformed data.

Include the transformations from step 4 so that they are applied consistently
Save the results as a SavedModel for production
Visualize and explore the training process using TensorBoard
Also save an EvalSavedModel for analysis of model performance

Components

Trainer trains the model using TensorFlow Estimators

In an editor:

Uncomment the lines marked Step 5 in both taxi_pipeline.py and taxi_utils.py
Take a moment to review the code that you uncommented

In a browser:

Return to DAGs list page in Airflow
Click the refresh button on the right side for the taxi DAG
- You should see "DAG [taxi] is now fresh as a daisy"
Trigger taxi
Wait for pipeline to complete
- All dark green
- Use refresh on right side or refresh page

Back on Jupyter:

Open step5.ipynb
Follow the notebook

More advanced example

The example presented here is really only meant to get you started. For a more advanced example see the TensorBoard Tutorial.

Step 6: Analyzing model performance

Understanding more than just the top level metrics.

Users experience model performance for their queries only
Poor performance on slices of data can be hidden by top level metrics
Model fairness is important
Often key subsets of users or data are very important, and may be small
- Performance in critical but unusual conditions
- Performance for key audiences such as influencers

Components

Evaluator performs deep analysis of the training results.

In an editor:

Uncomment the lines marked Step 6 in both taxi_pipeline.py and taxi_utils.py
Take a moment to review the code that you uncommented

In a browser:

Return to DAGs list page in Airflow
Click the refresh button on the right side for the taxi DAG
- You should see "DAG [taxi] is now fresh as a daisy"
Trigger taxi
Wait for pipeline to complete
- All dark green
- Use refresh on right side or refresh page

Back on Jupyter:

Open step6.ipynb
Follow the notebook

More advanced example

The example presented here is really only meant to get you started. For more information on using TFMA to analyze model performance, see the examples on tensorflow.org.

Step 7: Ready for production

If the new model is ready, make it so.

If we’re replacing a model that is currently in production, first make sure that the new one is better
ModelValidator tells Pusher if the model is OK
Pusher deploys SavedModels to well-known locations

Deployment targets receive new models from well-known locations

TensorFlow Serving
TensorFlow Lite
TensorFlow JS
TensorFlow Hub

Components

ModelValidator ensures that the model is "good enough" to be pushed to production.
Pusher deploys the model to a serving infrastructure.

In an editor:

Uncomment the lines marked Step 7 in both taxi_pipeline.py and taxi_utils.py
Take a moment to review the code that you uncommented

In a browser:

Return to DAGs list page in Airflow
Click the refresh button on the right side for the taxi DAG
- You should see "DAG [taxi] is now fresh as a daisy"
Trigger taxi
Wait for pipeline to complete
- All dark green
- Use refresh on right side or refresh page

Next Steps

You have now trained and validated your model, and exported a SavedModel file under the ~/airflow/saved_models/taxi directory. Your model is now ready for production. You can now deploy your model to any of the TensorFlow deployment targets, including:

TensorFlow Serving, for serving your model on a server or server farm and processing REST and/or gRPC inference requests.
TensorFlow Lite, for including your model in an Android or iOS native mobile application, or in a Raspberry Pi, IoT or microcontroller application.
TensorFlow.js, for running your model in a web browser or a Node.JS application.

unhkd-dee/TFX-DevSummit-2019

TFX Developer Tutorial

Introduction

Learn more

Step by step

Prerequisites

Tutorial Materials

Exercise

Solution

What you're doing

Adding the code for each step

Chicago Taxi Dataset

Goal - Binary classification

Step 1: Setup your environment

Step 2: Bring up initial pipeline skeleton

Hello World

In a browser:

DAG view buttons

Step 3: Dive into our data

Components

In an editor:

In a browser:

Back on Jupyter:

More advanced example

Step 4: Feature engineering

Components

In an editor:

In a browser:

Back on Jupyter:

More advanced example

Step 5: Training

Components

In an editor:

In a browser:

Back on Jupyter:

More advanced example

Step 6: Analyzing model performance

Components

In an editor:

In a browser:

Back on Jupyter:

More advanced example

Step 7: Ready for production

Components

In an editor:

In a browser:

Next Steps