This repository houses a simple BERT model trained for humor detection deployed using a TensorFlow Extended (TFX) end-to-end pipeline. The model utilizes a Kaggle dataset consisting of 200k short texts and a binary target column determining whether or not those texts involve humor or not. Labels have been encoded to be integers 0 (no humor) or 1 (humor).
- The
ExampleGen
component will take our csv data and convert it into working examples for our other pipeline components. It will provide consistent splits and shuffles the dataset for ML best practice. - The
StatisticsGen
component generates features statistics over both training and serving data provided byExampleGen
, which can be used by other pipeline components. - The
SchemaGen
component creates a schema based on the statistics of our data (in this case provided byStatisticsGen
). Schemas can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. - The
ExampleValidator
component identifies any anomalies in the example data by comparing data statistics computed by theStatisticsGen
pipeline component against the schema we created usingSchemaGen
. The inferred schema codifies properties which the input data is expected to satisfy. - The
Transform
component performs feature engineering on our examples created by theExampleGen
component, using our data schema created by theSchemaGen
component, and emits both a SavedModel as well as statistics on both pre-transform and post-transform data. - The
Tuner
component tunes the hyperparameters for the model. - The
Trainer
component will train our BERT model. Resolver
is a special TFX node which handles special artifact resolution logics that will be used as inputs for downstream nodes.- The
Evaluator
component performs deep analysis on the training results for our models, to help us understand how our model performs on subsets of the data. TheEvaluator
can also optionally validate exported models, ensuring that they are "good enough" to be pushed to production. - The
Pusher
component is used to push a validated model to a deployment target during model training or re-training. Before the deployment, Pusher relies on one or more blessings from other validation components to decide whether to push the model or not. We'll use the results from ourEvaluator
component to determine whether or not our model is ready to be pushed.