/targets-keras

An example Keras pipeline with the targets R package

Primary LanguageROtherNOASSERTION

targets R package Keras model example

Launch RStudio Cloud

The goal of this workflow is find the Keras model that best predicts customer attrition (“churn”) on a subset of the IBM Watson Telco Customer Churn dataset. (See this RStudio Blog post by Matt Dancho for a thorough walkthrough of the use case.) Here fit multiple Keras models to the dataset with different tuning parameters, pick the one with the highest classification test accuracy, and produce a trained model for the best set of tuning parameters we find.

The targets pipeline

The targets R package manages the workflow. It automatically skips steps of the pipeline when the results are already up to date, which is critical for machine learning tasks that take a long time to run. It also helps users understand and communicate this work with tools like the interactive dependency graph below.

library(targets)
tar_visnetwork()

How to access

You can try out this example project as long as you have a browser and an internet connection. Click here to navigate your browser to an RStudio Cloud instance. Alternatively, you can clone or download this code repository and install the R packages listed here.

How to run

In the R console, call the tar_make() function to run the pipeline. Then, call tar_read(hist) to retrieve the histogram. Experiment with other functions such as tar_visnetwork() to learn how they work.

File structure

The files in this example are organized as follows.

├── run.sh
├── run.R
├── _targets.R
├── sge.tmpl
├── R/
├──── functions.R
├── data/
├──── customer_churn.csv
└── report.Rmd
File Purpose
run.sh Shell script to run run.R in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers.
run.R R script to run tar_make() or tar_make_clustermq() (uncomment the function of your choice.)
_targets.R The special R script that declares the targets pipeline. See tar_script() for details.
sge.tmpl A clustermq template file to deploy targets in parallel to a Sun Grid Engine cluster.
R/functions.R An R script with user-defined functions. Unlike _targets.R, there is nothing special about the name or location of this script. In fact, for larger projects, it is good practice to partition functions into multiple files.
data/customer_churn.csv A subset of the IBM Watson Telco Customer Churn dataset
report.Rmd An R Markdown report summarizing the results of the analysis. For more information on how to include R Markdown reports as reproducible components of the pipeline, see the tar_render() function from the tarchetypes package and the literate programming chapter of the manual.

High-performance computing

You can run this project locally on your laptop or remotely on a cluster. You have several choices, and they each require modifications to run.R and _targets.R.

Mode When to use Instructions for run.R Instructions for _targets.R
Sequential Low-spec local machine or Windows. Uncomment tar_make() No action required.
Local multicore Local machine with a Unix-like OS. Uncomment tar_make_clustermq() Uncomment options(clustermq.scheduler = "multicore")
Sun Grid Engine Sun Grid Engine cluster. Uncomment tar_make_clustermq() Uncomment options(clustermq.scheduler = "sge", clustermq.template = "sge.tmpl")