TensorIO is an Objective-C wrapper for an underlying machine learning library and currently supports TensorFlow and TensorFlow Lite. It abstracts the work of copying bytes into and out of tensors and allows you to interract with native types instead, such as numbers, arrays, dictionaries, and pixel buffers.
This implementation is part of the TensorIO project with support for machine learning on iOS, Android, and React Native.
With TensorIO you can perform inference in just a few lines of code:
UIImage *image = [UIImage imageNamed:@"example-image"];
TIOPixelBuffer *buffer = [[TIOPixelBuffer alloc] initWithPixelBuffer:image.pixelBuffer orientation:kCGImagePropertyOrientationUp];
TIOTFLiteModel *model = [TIOTFLiteModel modelWithBundleAtPath:path];
NSDictionary *inference = (NSDictionary*)[model runOn:buffer];
NSDictionary *classification = [inference[@"classification"] topN:5 threshold:0.1];
And in Swift:
let image = UIImage(named: "example-image")!
let pixels = image.pixelBuffer()!
let value = pixels.takeUnretainedValue() as CVPixelBuffer
let buffer = TIOPixelBuffer(pixelBuffer:value, orientation: .up)
let model = TIOTFLiteModel.withBundleAtPath(path)!
let inference = model.run(on: buffer)
let classification = ((inference as! NSDictionary)["classification"] as! NSDictionary).topN(5, threshold: 0.1)
See the Usage section below for important notes on adding TensorIO to your project.
For the complete Objectice-C project documentation, visit tensorio.info.
- Overview
- Example
- Requirements
- Installation
- Author
- License
- Usage
- Advanced Usage
- Net Runner
TensorIO supports many kinds of models with multiple input and output layers of different shapes and kinds but with minimal boilerplate code. In fact, you can run a variety of models without needing to write any model specific code at all.
Instead, TensorIO relies on a JSON description of the model that you provide. During inference, the library matches incoming data to the model layers that expect it, performing any transformations that are needed and ensuring that the underlying bytes are copied to the right place. Once inference is complete, the library copies bytes from the output tensors back to native Objective-C types.
The built-in class for working with TensorFlow Lite (TF Lite) models, TIOTFLiteModel
, includes support for multiple input and output layers; single-valued, vectored, matrix, and image data; pixel normalization and denormalization; and quantization and dequantization of data. In case you require a completely custom interface to a model you may specify your own class in the JSON description, and TensorIO will use it in place of the default class.
Although TensorIO supports both full TensorFlow and TF Lite models, this README will refer to TFLite throughout. Except for small differences in support of data types (uint8_t
, float32_t
, etc), the interface is the same.
To run the example project, clone the repo, and run pod install
from the Example directory first.
- See MainViewController.mm for sample code.
- See TensorIOTFLiteModelIntegrationTests.mm for more complex models.
- iPython notebooks for the test models may be found in the notebooks directory in this repo.
For more detailed information about using TensorIO, refer to the Usage section below.
TensorIO requires iOS 9.3+
TensorIO is available through CocoaPods. Add the following to your Podfile:
pod 'TensorIO/TFLite'
And run pod install
.
If you would prefer to use TensorFlow instead add the following:
pod 'TensorIO/TensorFlow'
Principle Contributors
TensorIO is available under the Apache 2 license. See the LICENSE file for more info.
Because the umbrella TensorIO header imports headers with C++ syntax, any files that use TensorIO must have Obj-C++ extensions. Rename your .m
files to .mm
.
Wherever you'd like to use TensorIO, import the umbrella header:
#import <TensorIO/TensorIO-umbrella.h>
To use TensorIO as a module, make sure use_frameworks!
is uncommented in your Podfile, and add the following Other C Flags to your project's build settings:
-fmodules
-fcxx-modules
Wherever you'd like to use TensorIO, add:
@import TensorIO;
Make sure use_frameworks!
is uncommented in your Podfile, and wherever you'd like to use TensorIO, simply import it:
import TensorIO
let model = TIOTFLiteModel.withBundleAtPath(path)
let classifications = model.run(on: buffer)
A TensorIO model takes a set of inputs, performs inference, and returns a set of outputs.
Consider a model that predicts the price of a house given a feacture vector that includes square footage, number of bedrooms, number of bathrooms, proximity to a school, and so forth.
With TensorIO you construct an NSArray
of numeric values for these features, pass the array to your model, and extract the price from the results.
TIOTFLiteModel *model = ...
NSArray *input = @[ @(1890), @(3), @(2), @(1.6) ];
NSDictionary *output = (NSDictionary*)[model runOn:input];
NSNumber *price = output[@"price"];
TIOData
TensorIO models take inputs and produce outputs of type TIOData
. This is a generic protocol that simply marks native data types as available to TensorIO models. A backend that supports a specific underlying machine learning library extends this protocol and implements methods that copy data into and out of tensors.
TensorIO backends such as those for TensorFlow and TFLite will always include implementations of this protocol for the following classes:
- NSNumber
- NSData
- NSArray
- NSDictionary
- CVPixelBufferRef with TIOPixelBuffer
In the above example, we're passing a single NSArray
to the model. The model extracts numeric byte values from the array, copying them into the underlying TF Lite model. It asks the underlying model to perform inference, and then copies the resulting bytes back into an NSNumber
. That NSNumber
is added to a dictionary under the @"price"
key, and it is this dictionary which the model returns.
Model Outputs
Why is the resulting price not returned directly, and how do we know that the value is keyed to @"price"
in the returned dictionary?
Because models may have multiple inputs and outputs, TensorIO tries to make no assumptions about how many input and output layers a model actually has. This gives it some flexiblity in what kinds of inputs it can take, for example a single numeric value, arrays of numeric arrays, or a dictionary, and it intelligently matches those inputs to the underlying tensor buffers, but a model consequently always returns a dictionary of outputs.
(Note: this may change in a future implementation, and single outputs may be returned directly)
To understand why the output value is keyed to a specific entry, we must understand how TensorIO is able to match Objective-C inputs and outputs to the underlying model's input and output layers, and for that we require an understanding of model bundles and the JSON file which describes the underlying model.
TensorIO currently includes support for TensorFlow Lite (TF Lite) models. Although the library is built with support for other machine learning frameworks in mind, we'll focus on TF Lite models here.
A TF Lite model is contained in a single .tflite file. All the operations and weights required to perform inference with a model are included in this file.
However, a model may have other assets that are required to interpret the resulting inference. For example, an ImageNet image classification model will output 1000 values corresponding to the softmax probability that a particular object has been recognized in an image. The model doesn't match probabilities to their labels, for example "rocking chair" or "lakeside", it only outputs numeric values. It is left to us to associate the numeric values with their labels.
Rather than requiring a developer to do this in application space and consequently store the lables in a text file or in some code somewhere in the application, TensorIO wraps models in a bundle and allows model builders to include additional assets in that bundle.
A TensorIO bundle is just a folder with an extension that identifies it as such: .tiobundle. Assets may be included in this bundle and then referenced from model specific code.
When you use your own models with TensorIO, make sure to put them in a folder with the .tiobundle extension.
A TensorIO TF Lite bundle has the following directory structure:
mymodel.tiobundle
- model.tflite
- model.json
- assets
- file.txt
- ...
The model.json file is required. It describes the interface to your model and includes other metadata about it. More on that below.
The model.tflite file is required but may have another name. The bundle must include some .tflite file, but its actual name is specified in model.json.
The assets directory is optional and contains any additional assets required by your specific use case. Those assets may be referenced from model.json.
Because image classification is such a common task, TensorIO includes built-in support for it, and no additional code is required. You'll simply need to specify a labels file in the model's JSON description, which we'll look at in a moment.
Using Model Bundles
TensorIO encapsulate information about a model in TIOModelBundle
. This class parses the metadata for a model from the model.json file and manage access to files in the assets directory.
You may load a bundle from a known path:
NSString *path = @"...";
TIOModelBundle *bundle = [[TIOModelBundle alloc] initWithPath:path];
Model bundles are also used to instantiate model instances with the newModel
method, effectively functioning as model factories. Each call to this method produces a new model instance:
id<TIOModel> model = [bundle newModel];
Classes that conform to the TIOModel
protocol also implement a convenience method for instantiating models directly from a model bundle path:
NSString *path = @"...";
TIOTFLiteModel *model = [TIOTFLiteModel modelWithBundleAtPath:path];
One of TensorIO's goals is to reduce the amount of new code required to integrate models into an application.
The primary work of using a model on iOS involves copying bytes of the right length to the right place. TF Lite, for example, is a C++ library, and the input and output tensors are exposed as C style buffers. In order to use a model we must copy byte representations of our input data into these buffers, ask the library to perform inference on those bytes, and then extract the byte representations back out of them.
Model interfaces can vary widely. Some models may have a single input and single output layer, others multiple inputs with a single output, or vice versa. The layers may be of varying shapes, with some layers taking single values, others an array of values, and yet others taking matrices or volumes of higher dimensions. Some models may work on four byte, floating point representations of data, while others use single byte, unsigned integer representations. The latter are called quantized models, more on them below.
Consequently, every time we want to try a different model, or even the same model with a slightly different interface, we must modify the code that moves bytes into and out of buffers.
TensorIO abstracts the work of copying bytes into and out of tensors and replaces that imperative code with a declarative language you already know: JSON.
The model.json file in a TensorIO bundle contains metadata about the underlying model as well as a description of the model's input and output layers. TensorIO parses those descriptions and then, when you perform inference with the model, internally handles all the byte copying operations, taking into account layer shapes, data sizes, data transformations, and even output labeling. All you have to do is provide data to the model and ask for the data out of it.
The model.json file is the primary point of interaction with the TensorIO library. Any code you write to prepare data for a model and read data from a model will depend on a description of the model's input and output layers that you provide in this file.
Let's have a closer look.
The model.json file has the following basic structure:
{
"name": "ModelName",
"details": "Description of your model",
"id": "unique-identifier",
"version": "1",
"author": "doc.ai",
"license": "MIT",
"model": {
"file": "model.tflite",
"quantized": false,
"type": "image.classification.imagenet",
"backend": "tflite",
"modes": ["train"]
},
"inputs": [
{
...
}
],
"outputs": [
{
...
}
]
}
In addition to the model's metadata, such as name, identifier, version, etc, all of which are required, the JSON file also includes three additional, required entries:
- The model field is a dictionary that contains information about the model itself
- The inputs field is an array of dictionaries that describe the model's input layers
- The outputs field is an array of dictionaries that describe the model's output layers
The model field is a dictionary that itself contains two to five entries:
"model": {
"file": "model.tflite",
"backend": "tflite",
"quantized": false,
"modes": ["train"],
"type": "image.classification.imagenet",
"class": "MyOptionalCustomClassName"
}
file
The file field is a string value that contains the name of your model file. For TF Lite models it is the file with the .tflite extension that resides at the top level of your model bundle folder. For TensorFlow models it is the directory produced by Estimator.export_saved_model or by tf.saved_model.simple_save and which contains the saved_model.pb file and a variables directory. This folder must reside at the top level of your model bundle.
This field is required.
backend
TensorIO supports multiple machine learning libraries, or backends. The backend field is a string that identifies which backend to use for this model. TensorFlow and TF Lite are currently supported, and this field should indicate which one to use, either of:
- tflite
- tensorflow
This field is required.
quantized
The quantized field is a boolean value that is true
when your model is quantized and false
when it is not. Quantized models perform inference on single byte, unsigned integer representations of your data (uint8_t
). Quantized models involve additional considerations which are discussed below.
This field is required.
modes
The modes field is an array of strings that describes the modes supported by this model, for example, if the model supports prediction, training, or evaluation. The strings may be the following values:
- predict
- train
- eval
TF Lite models only support prediction while TensorFlow models support both training and prediction.
This field is optional but will be required in a future version.
type
The type field is a string value that describes the class of models your model belongs to. Currently the field supports arbitrary strings with no formal hierarchy.
This field is optional.
class
The class field is a string value that contains the Objective-C class name of the custom class you would like to use with your model. It must conform to the TIOModel
protocol and ship with your application. A custom class is not required, and TensorIO will use TIOTFLiteModel
by default and assume you are using a TensorFlow Lite backend. If you are using the full TensorFlow build you must currently set the custom class name to TIOTensorFlowModel
.
This field is optional.
The inputs field is an array of dictionaries that describe the input layers of your model. There must be a dictionary entry for each input layer in your model. TensorIO uses the information in this field to match inputs to model layers and correctly copy bytes into tensor buffers.
A basic entry in this array will have the following fields:
{
"name": "layer-name",
"type": "array",
"dtype": "float32",
"shape": [224]
}
name
The name field is a string value that names this input tensor. It should match the name of a tensor in the underlying model and functions as a reference in application space in case you would like to pass an NSDictionary
as input to a model's runOn:
method.
This field is required.
type
The type field specifies the kind of data this tensor expects. Only two types are currently supported:
- array
- image
Use the array type for shapes of any dimension, including single values, vectors, matrices, and higher dimensional tensors. Use the image type for image inputs.
This field is required.
dtype
The dtype field indicates what type of data this input accepts and will correspond, for example, to a primitive C type or a TensorFlow dtype. The following data types are supported:
- uint8
- float32
- int32
- int64
Note that complete support for this field is in development and that not all backends support all datatypes. TFLite supports only uint8 and float32 data types, and this field is ignored. Quantized models automatically use uint8 types and unquantized models float32 types. The full TensorFlow backend, on the other hand, supports all four types, but if a type is not specified it defaults to float32.
This field is currently optional. The float32 is assumed in most cases.
shape
The shape field is an array of integer values that describe the size of the input layer, ignoring whether the layer expects four byte or single byte values. Common shapes might include:
// a single-valued input
"shape": [1]
// a vector with 16 values
"shape": [1,16]
// a matrix with 32 rows and 100 columns
"shape": [32,100]
// a three dimensional image volume with a width of 224px,
// a height of 224px, and 3 channels (RGB)
"shape": [224,224,3]
If you are using TensorFlow models with tensors whose first dimension takes a variable batch size, use a -1
for the first dimension of the shape:
"shape": [-1,224,224,3]
The shape should accurately reflect the shape of the underlying tensor, even though in many cases what matters is the total byte count. For example, a row vector with sixteen elements would have a shape of [1,16]
while a column vector one of [16,1]
.
This field is required.
Unrolling Data
Although we describe the inputs to a layer in terms of shapes with multiple dimensions, and from a mathematical perspective work with vectors, matrices, and tensors, at a machine level, neither TensorIO nor TensorFlow Lite has a concept of a shape.
From a tensor's perspective all shapes are represented as an unrolled vector of numeric values and packed into a contiguous region of memory, i.e. a buffer. Similary, from an Objective-C perspective, all values passed as input to a TensorIO model must already be unrolled into an array of data, either an array of bytes when using NSData
or an array of NSNumber
when using NSArray
.
When you order your data into an array of bytes or an array of numbers in preparation for running a model on it, unroll the bytes using row major ordering. That is, traverse higher order dimensions before lower ones.
For example, a two dimensional matrix with the following values should be unrolled across columns first and then rows. That is, start with the first row, traverse every column, move to the second row, traverse every column, and so on:
[ [ 1 2 3 ]
[ 4 5 6 ] ]
NSArray *matrix = @[ @(1), @(2), @(3), @(4), @(5), @(6) ];
Apply the same approach for volumes of a higher dimension, as mind-boggling as it starts to get.
Additional Fields
There are additional fields for handling data transformations such as quantization and pixel normalization. These will be discussed in their respective sections below.
Both Order and Name Matter
Input to a TIOModel
may be organized by either index or name, so that both the order of the dictionaries in the inputs array and their names are significant. TF Lite tensors are accessed by index, but internally TensorIO associates a name with each index in case you prefer to send NSDictionary
inputs to your models. TensorFlow models use the name exclusively, which is why names must match the names of underlying tensors.
Example
Here's what the inputs field looks like for a model with two input layers, the first a vector with 8 values and the second a 10x10 matrix:
"inputs": [
{
"name": "vector-input",
"type": "array",
"shape": [8]
},
{
"name": "matrix-input",
"type": "array",
"shape": [10,10]
}
],
With this description we can pass either an array of arrays or a dictionary of arrays to the model's runOn:
method. To pass an array, make sure the order of your inputs matches the order of their entries in the JSON file:
NSArray *vectorInput = @[ ... ]; // with 8 values
NSArray *matrixInput = @[ ... ]; // with 100 values in row major order
NSArray *arrayInputs = @[
vectorInput,
matrixInput
];
[model runOn:arrayInputs];
To pass a dictionary, simply associate the correct name with each value:
NSArray *vectorInput = @[ ... ]; // with 8 values
NSArray *matrixInput = @[ ... ]; // with 100 values in row major order
NSDictionary *dictionaryInputs = @{
@"vector-input": vectorInput,
@"matrix-input": matrixInput
};
[model runOn:dictionaryInputs];
The outputs field is an array of dictionaries that describe the output layers of your model. The outputs field is structured the same way as the inputs field, and the dictionaries contain the same basic entries as those in the inputs field:
"outputs": [
{
"name": "vector-output",
"type": "array",
"dtype": "float32",
"shape": [8]
}
]
The Labels Field
An array type output optionally supports the presence of a labels field for classification outputs:
"outputs": [
{
"name": "classification-output",
"type": "array",
"shape": [1000],
"labels": "labels.txt"
}
]
The value of this field is a string which corresponds to the name of a text file in the bundle's assets directory. The .tiobundle directory structure for this model might look like:
mymodel.tiobundle
- model.json
- model.tflite
- assets
- labels.txt
Each line of the labels.txt text file contains the name of the classification for that line number index in the layer's output. When a labels field is present, TensorIO internally maps labels to their numeric outputs and returns an NSDictionary
representation of that mapping, rather than a simple NSArray
of values. Let's see what that looks like.
Model Outputs
Normally, a model returns a dictionary of array values from its runOn:
method, and those values will usually be arrays. Each layer produces its own entry in that dictionary, corresponding to the name of the layer in its JSON description.
For example, a self-driving car model might classify three kinds of things in an image (well, hopefully more than that!). The outputs field for this model might look like:
"outputs": [
{
"name": "classification-output",
"type": "array",
"shape": [3],
}
]
After performing inference the underlying TensorFlow model will produce an output with three values corresponding to the softmax probability that this item appears in the image. TensorIO extracts those bytes and packs them into an NSArray
of NSNumber
:
NSDictionary *inference = (NSDictionary*)[model runOn:input];
NSArray<NSNumber*> *classifications = inference[@"classification-output"];
// classifications[0] == 0.25
// classifications[1] == 0.75
// classifications[2] == 0.25
However, when a labels entry is present for a layer, the entry for that layer will itself be a dictionary mapping names to values.
Our self-driving car model might for example add a labels field to the above description:
"outputs": [
{
"name": "classification-output",
"type": "array",
"shape": [3],
"labels": "labels.txt"
}
]
With a labels.txt file in the bundle's assets directory that looks like:
pedestrian
car
motorcycle
The underlying tensorflow model still produces an output with three values corresponding to the softmax probability that this item appears in the image. TensorIO, however, now maps labels to those probabilities and returns a dictionary of those mappings:
NSDictionary *inference = (NSDictionary*)[model runOn:input];
NSDictionary<NSString*, NSNumber*> *classifications = inference[@"classification-output"];
// classifications[@"pedestrian"] == 0.25
// classifications[@"car"] == 0.75
// classifications[@"motorcycle"] == 0.25
Single Valued Outputs
In some cases your model might output a single value in one of its output layers. Consider the housing price model we discussed earlier. When that is the case, instead of wrapping that single value in an array and returning an array for that layer, TensorIO will simply output a single value for it.
Consider a model with two output layers. The first layer outputs a vector of four values while the second outputs a single value:
"outputs": [
{
"name": "vector-output",
"type": "array",
"shape": [4]
},
{
"name": "scalar-output",
"type": "array",
"shape": [1]
}
]
After performing inference, access the first layer as an array of numbers and the second layer as a single number:
NSDictionary *inference = (NSDictionary*)[model runOn:input];
NSArray<NSNumber*> *vectorOutput = inference[@"vector-output"];
NSNumber *scalarOutput = inference[@"scalar-output"];
Scalar outputs are supported as a convenience. Model outputs may change in a later version of this library and so this convenience may be removed or modified.
You may optionally included an options field in the JSON description. It contains properties that are not required by TensorIO to perform inference but which are used in application specific ways. TensorIO will ignore these properties but you may inspect them from application space to change your product's behavior when a particular model is running.
Two options are currently supported: device_position and output_format:
Device Position
The device_position option target computer vision models specifically and tells a consumer of the model which device camera it should begin with for this model. For example, some models target facial features and would prefer that the model run on the front facing camera initially, while others target features of the world and would prefer to run on the back facing camera. Valid entries for this field include:
- front
- back
Output Format
The value of the output_format field is an arbitrary, application specific string providing a hint to consumers of the model about how they should format the model's output. Provide any arbitrary string.
For example, Net Runner knows how to interpret "image.classification.nodecay". When it sees this output format identifier, it will inspect a model's output, expecting a single "classification" output of an array of values, and format the probability values to two decimal places without applying any exponential decay to them.
Let's see a complete example of a model with two input layers and two output layers. The model takes two vectors, the first with 4 values and the second with 8 values, and outputs two vectors, the first with 3 values and the second with 6.
Our tiobundle folder will have the following contents:
mymodel.tiobundle
- model.json
- model.tflite
The model.json file might look something like:
{
"name": "Example Model",
"details": "This model takes two vector valued inputs and produces two vector valued outputs",
"id": "my-awesome-model",
"version": "1",
"author": "doc.ai",
"license": "Apache 2",
"model": {
"file": "model.tflite",
"quantized": false
},
"inputs": [
{
"name": "foo-features",
"type": "array",
"shape": [4]
},
{
"name": "bar-features",
"type": "array",
"shape": [8]
}
],
"outputs": [
{
"name": "baz-outputs",
"type": "array",
"shape": [3]
},
{
"name": "qux-outputs",
"type": "array",
"shape": [6]
}
],
"options": {
"device_position": "back",
"output_format": "image.classification.nodecay"
}
}
And we can perform inference with this model as follows:
NSArray *fooFeatures = @[ @(1), @(2), @(3), @(4) ];
NSArray *barFeatures = @[ @(1), @(2), @(3), (@4), @(5), @(6), @(7), @(8) ];
NSDictionary *features = @{
@"foo-features": fooFeatures,
@"bar-features": barFeatures
};
NSDictionary *inference = (NSDictionary*)[model runOn:features];
NSArray *bazOutputs = inference[@"baz-outputs"]; // length 3
NSArray *quxOutputs = inference[@"qux-outputs"]; // length 6
Quantization is a technique for reducing model sizes by representing weights with fewer bytes. Operations are then performed on these shorter byte representations. Quantized models trade accuracy for size. A full account of quantization is beyond the scope of this README, but more information may be found at https://www.tensorflow.org/performance/quantization.
In TF Lite, models represent weights with and perform operations on four byte floating point representations of data (float_t
). These models receive floating point inputs and produce floating point outputs. Floating point models can represent numeric values in the range -3.4E+38 to +3.4E+38. Pretty sweet.
A quantized TF Lite model works with single byte representations (uint8_t)
. It expects single byte inputs and it produces single byte outputs. A single unsigned byte can represent numbers in the range of 0 to 255. Still pretty cool.
When you use a quantized model but start with floating point data, you must first transform that four byte representation into one byte. This is called quantization. The model's single byte output must also be transformed back into a floating point representation, an inverse process called dequantization. TensorIO can do both for you.
Let's see what a basic quantization and dequantization look like.
First, when working with a quantized TF Lite model, change the model.quantized field in the model.json file to true
:
"model": {
"file": "model.tflite",
"quantized": true
},
For this example, let's say the input data coming from application space will always be in a floating point range from 0 to 1. Our quantized model requires those values to be in the range from 0 to 255. Quantization in TF Lite uniformly distributes a floating point range over a single byte range, so all we need to do here is apply a scaling factor of 255:
quantized_value = unquantized_value * 255
We can perform a sanity check with a few values:
Unquantized Value -> Quantized Value
0 -> 0
0.5 -> 127
1 -> 255
Similarly, for this example let's say the output values produced by inference are a softmax probability distribution. The quantized model necessarily produces outputs in a range from 0 to 255, and we want to convert those back to a valid probability distribution. This will again be a uniform redistribution of values, and all we need to do is apply a scaling factor of 1.0/255.0:
unquantized_value = quantized_value * 1.0/255.0
Note that the transformations are inverses of one anther, and a sanity check produces the values we expect.
Instruct TensorIO to perform quantization by adding a quantize field to an input layer's description:
"inputs": [
{
"name": "vector-input",
"type": "array",
"shape": [4],
"quantize": {
"scale": 255,
"bias": 0
}
},
The quantize field is a dictionary value that may appear on array inputs only (image inputs use pixel normalization, more below). It contains either one or two fields: either both scale and bias, or standard.
scale
The scale field is a numeric value that specifies the scaling factor to apply to unquantized, incoming data.
bias
The bias field is a numeric value that specifies the bias to apply to unquantized, incoming data.
Together, TensorIO applies the following equation to any data sent to this layer:
quantized_value = (unquantized_value + bias) * scale
standard
The standard field is a string value corresponding to one of a number of commonly used quantization functions. Its presence overrides the scale and bias fields.
TensorIO currently has support for two standard quantizations. The ranges tell TensorIO what range of values you are quantizing from:
"quantize": {
"standard": "[0,1]"
}
"quantize": {
"standard": "[-1,1]"
}
Dequantization is the inverse of quantization and is specified for an output layer with the dequantize field. The same scale and bias or standard fields are used.
For dequantization, scale and bias are applied in inverse order, where the bias value will be the negative equivalent of a quantization bias, and the scale will be the inverse of a quantization scale.
unquantized_value = quantized_value * scale + bias
For example, to dequantize from a range of 0 to 255 back to a range of 0 to 1, use a bias of 0 and a scale of 1.0/255.0:
"outputs": [
{
"name": "vector-output",
"type": "array",
"shape": [4],
"dequantize": {
"scale": 0.004,
"bias": 0
}
}
]
A standard set of dequantization functions is supported and describes the range of values you want to dequantize back to:
"dequantize": {
"standard": "[0,1]"
}
"dequantize": {
"standard": "[-1,1]"
}
The [0,1] standard dequantization is particularly useful for softmax proability outputs with quantized models, when you must convert from a quantized range of [0,255] back to a valid probability distribution in the range of [0,1].
Using Quantization and Dequantization
Once these fields have been specified in a model.json file, no additional change is required in the Objective-C code. Simply send floating point values in and get floating point values back:
NSArray *vectorInput = @[ @(0.1f), @(0.2f), @(0.3f), @(0.4f) ]; // range in [0,1]
NSDictionary *features = @{
@"vector-input": vectorInput
};
NSDictionary *inference = (NSDictionary*)[model runOn:features];
NSArray *vectorOutput = inference[@"vector-output"];
// vectorOutput[0] == 0.xx...
// vectorOutput[1] == 0.xx...
// vectorOutput[2] == 0.xx...
// vectorOutput[3] == 0.xx...
Selecting the scale and bias terms for either quantization or dequantization is a matter of solving a system of linear equations.
Quantization Scale and Bias
For quantization, for example, you must know the range of values that are being quantized and the range of values you are quantizing to. The latter is always [0,255], while the former is up to you.
Then, given that the equation for quantizing a value is
quantized_value = (unquantized_value + bias) * scale
You can form two equations:
(min + bias) * scale = 0
(max + bias) * scale = 255
And solve for scale and bias. Because the first equation is always set equal to zero, it is trivial to solve for bias. Use that result to solve for scale in the second equation:
bias = -min
scale = 255 / (max - min)
For example, if you are quantizing from a range of values in [-1,1], then the scale and bias terms are:
bias = -(-1)
= 1
scale = 255 / (1-(-1))
= 255/2
= 127.5
Which are exactly the values TensorIO uses when you specify a standard quantize string "[-1,1]".
Dequantization Scale and Bias
For dequantization we do the same, using the equation:
unquantized_value = quantized_value * scale + bias
Form two equations:
min = 0 * scale + bias
max = 255 * scale + bias
And solve for scale and bias:
bias = min
scale = (max - bias) / 255
For example, if you are dequantizing from a range of values in [-1,1], then the scale and bias terms are:
bias = -1
scale = (1-(-1)) / 255
= 2/255
= 0.0078
Which once again are the values TensorIO uses when you specify the standard dequantize string "[-1,1]".
In both cases, you will need to know what the maximum and minimum values are that you are quantizing from and dequantizing to, and these must match the values you have used for your model.
Let's look at a complete example. This model is quantized and has two input layers and two output layers, with standard but different quantizations and dequantizations.
The model bundle will again have two files in it:
myquantizedmodel.tiobundle
- model.json
- model.tflite
Noting the value of the model.quantized field and the presence of quantize and dequantize fields in the input and output descriptions, the model.json file might look like:
{
"name": "Example Quantized Model",
"details": "This model takes two vector valued inputs and produces two vector valued outputs",
"id": "my-awesome-quantized-model",
"version": "1",
"author": "doc.ai",
"license": "Apache 2",
"model": {
"file": "model.tflite",
"backend": "tflite",
"quantized": true,
"modes": ["predict"]
},
"inputs": [
{
"name": "foo-features",
"type": "array",
"shape": [4],
"quantize": {
"standard": "[0,1]"
}
},
{
"name": "bar-features",
"type": "array",
"shape": [8],
"quantize": {
"standard": "[-1,1]"
}
}
],
"outputs": [
{
"name": "baz-outputs",
"type": "array",
"shape": [3],
"dequantize": {
"standard": "[0,1]"
}
},
{
"name": "qux-outputs",
"type": "array",
"shape": [6],
"dequantize": {
"standard": "[-1,1]"
}
}
]
}
Perform inference with this model as before:
NSArray *fooFeatures = @[ @(0.1f), @(0.2f), @(0.3f), @(0.4f) ]; // range in [0,1]
NSArray *barFeatures = @[ @(-0.1f), @(0.2f), @(0.3f), (@0.4f), @(-0.5f), @(0.6f), @(-0.7f), @(0.8f) ]; // range in [-1,1]
NSDictionary *features = @{
@"foo-features": fooFeatures,
@"bar-features": barFeatures
};
NSDictionary *inference = (NSDictionary*)[model runOn:features];
NSArray *bazOutputs = inference[@"baz-outputs"]; // length 3 in range [0,1]
NSArray *quxOutputs = inference[@"qux-outputs"]; // length 6 in range [-1,1]
The quantize field is optional for array input layers, even when the model is quantized. When you use a quantized model without including a quantize field, it is up to you to ensure that the data you send to TensorIO for inference is already quantized and that you treat output data as quantized.
This may be the case when your input and output data is only ever in the range of [0,255], for example pixel data, or when you are quantizing the floating point inputs yourself before sending them to the model.
For example:
NSArray<NSNumber*> *unquantizedInput = @[ @(0.1f), @(0.2f), @(0.3f), @(0.4f) ]; // range in [0,1]
NSArray<NSNumber*> *quantizedInput = [unquantizedInput map:^NSNumber * _Nonnull(NSNumber * _Nonnull obj) {
return @(obj.floatValue * 255); // convert from [0,1] to [0,255]
}];
NSDictionary *features = @{
@"quantized-input": quantizedInput
};
NSDictionary *inference = (NSDictionary*)[model runOn:features];
NSArray *quantizedOutput = inference[@"quantized-output"]; // in range [0,255]
NSArray *dequantizedOutput = [quantizedOutput map:^NSNumber * _Nonnull(NSNumber * _Nonnull obj) {
return @(obj.unsignedCharValue * 1.0/255.); // convert from [0,255] to [0,1]]
}];
TensorIO has built-in support for image data and can perform inference on image data as well as return image data as an output. A key concept when working with image data is the pixel buffer, which is a pixel by pixel representation of an image in memory.
TensorIO works with pixel buffers and includes a wrapper for the native CVPixelBufferRef
. It also provides utility functions for converting instances of UIImage
to and from pixel buffers.
As always, inform TensorIO that an input layer expects pixel buffer data by modifying that layer's description in model.json. Set its type to image. You must specify the shape as an array of [height, width, channels] and the format of the image as either RGB or BGR. More on image formats below.
For now let's assume the tensor takes image volumes of size 224x224x3 with RGB byte ordering:
"inputs": [
{
"name": "image-input",
"type": "image",
"shape": [224,224,3],
"format": "RGB"
}
]
We can then pass image data to this model by wrapping an image's pixel buffer in a TIOPixelBuffer
, which knows how to copy pixel data to the tensor given the format:
UIImage *image = [UIImage imageNamed:@"example-image"];
CVPixelBufferRef pixelBuffer = image.pixelBuffer;
TIOPixelBuffer *buffer = [[TIOPixelBuffer alloc] initWithPixelBuffer:pixelBuffer orientation:kCGImagePropertyOrientationUp];
NSDictionary *inference = (NSDictionary*)[model runOn:buffer];
A pixel buffer is a pixel by pixel representation of image data laid out in a contiguous block of memory. On iOS some APIs provide raw pixel buffers by default, such the AVFoundation APIs, while in other cases we must construct pixel buffers ourselves.
A pixel buffer always has a size, which includes the width and height, as well as a format, such as ARGB or BGRA, which lets the buffer know how many channels of data there are for each pixel and in what order those bytes appear. In the case of ARGB and BGRA, there are four channels of data arranged in alpha-red-green-blue or blue-green-red-alpha order respectively.
The ARGB and BGRA pixel buffers on iOS represent each pixel using four bytes of memory, with a single byte allocated to each channel. Each color in the pixel is represented by a range of values from 0 to 255, and the alpha channel also, allowing a pixel to represent over 16 million colors with 256 alpha values.
Tensors operate on pixel buffers with specific byte orderings. Imagine the memory for a pixel buffer in ARGB format. The top left pixel at (0,0) will appear first, then the pixel to its right at (1,0), and to its right at (2,0) and so on, for each column and each row in the image, with the bytes appearing in alpha-red-green-blue order:
[ARGB][ARGB][ARGB][ARGB][ARGB]...
Now imagine what that same image looks like to the tensor in BGRA format:
[BGRA][BGRA][BGRA][BGRA][BGRA]...
The byte ordering, which is to say, the format of the pixel buffer, definitely matters!
You must let TensorIO know what byte ordering an input layer expects via the format field. Consequently you must know what byte ordering your model expects.
TensorIO supports two byte orderings, RGB and BGR. Models ignore the alpha channel and don't expect it to be present, so TensorIO internally skips it when copying ARGB or BGRA pixel buffer bytes into tensors.
{
"format": "RGB"
}
{
"format": "BGR"
}
Notice that pixels are represented using a single byte of data for each color channel, a uint8_t
. Recall what we know about quantized models. By default, TF Lite works with four byte floating point representations of data, float_t
, but when the model is quantized it uses single byte uint8_t
representations of data.
Hm. It looks like pixel buffer data is already "quantized"!
In fact, when working with quantized models, you may pass pixel buffer data directly to input layers and read it directly from output layers without needing to transform the data (other than skipping the alpha channel). Quantized models already work on values in a range from 0 to 255, and pixel buffer data is exactly in this range.
Models that are not quantized, however, expect pixel buffer data in a floating point representation, and they will typically want it in a normalized range of values, usually from 0 to 1 or from -1 to 1. The process of converting pixel values from a single byte representation to a floating point representation is called normalization, and TensorIO includes built-in support for it.
The Normalize Field
As always, you will need to update the description of an input layer to indicate what kind of normalization you want. Include the normalize field in the layer's entry. Like the quantize field it takes either two entries or a single entry: either scale and bias, or a standard field, with the difference that bias may be applied on a per channel basis.
scale
The scale field is a numeric value that specifies the scaling factor to apply to incoming pixel data.
bias
The bias field is a dictionary value that specifies the bias to apply to incoming pixel data, on a per channel basis, and itself includes three entries, r, g, and b.
Together, a scale and bias entry might look like:
"normalize": {
"scale": 0.004,
"bias": {
"r": -0.485,
"g": -0.457,
"b": -0.408
}
}
And together, TensorIO applies the following equation to any pixel data sent to this layer:
normalized_red_value = scale * red_value + red_bias
normalized_green_value = scale * green_value + green_bias
normalized_blue_value = scale * blue_value + blue_bias
standard
The standard field is a string value corresponding to one of a number of commonly used normalizations. Its presence overrides the scale and bias fields.
TensorIO currently supports two standard normalizations. The ranges tell TensorIO what values you are normalizing to:
"normalize": {
"standard": "[0,1]"
}
"normalize": {
"standard": "[-1,1]"
}
TensorIO can also read pixel data from output tensors and reconstruct pixel buffers from them. When reading pixel data from an unquantized model it will usually be necessary to convert the values from a normalized floating point representation back to uint8_t
values in the range of 0 to 255. This process is called denormalization, and once again TensorIO has built in support for it.
To denormalize pixel data add a denormalize field to an output layer's description. Like the normalize field this field takes either scale and bias entries or a standard entry. The fields work as they do for normalization but as their inverses.
For bias and scale, the following equation will be applied:
red_value = (normalized_red_value + red_bias) * scale
green_value = (normalized_green_value + green_bias) * scale
blue_value = (normalized_blue_value + blue_bias) * scale
Similarly, TensorIO supports two standard denormalizations. The ranges tell TensorIO what values you are denormalizing from:
"denormalize": {
"standard": "[0,1]"
}
"denormalize": {
"standard": "[-1,1]"
}
Let's look at a complete example. This is the unquantized MobileNetV2 image classification model provided by TensorFlow. It takes a single input, image data of size 224x224x3 in RGB format, and produces a single output, a vector of 1000 softmax probabilities identifying the object in the image. It expects image data to be normalized to a range from -1 to 1, and we would like to label the output data.
The model bundle folder might look something like:
mobilenet-model.tiobundle
- model.json
- model.tflite
- assets
- labels.txt
The model.json file might look like:
{
"name": "MobileNet V2 1.0 224",
"details": "MobileNet V2 with a width multiplier of 1.0 and an input resolution of 224x224. \n\nMobileNets are based on a streamlined architecture that have depth-wise separable convolutions to build light weight deep neural networks. Trained on ImageNet with categories such as trees, animals, food, vehicles, person etc. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.",
"id": "mobilenet-v2-100-224-unquantized",
"version": "1",
"author": "Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam",
"license": "Apache License. Version 2.0 http://www.apache.org/licenses/LICENSE-2.0",
"model": {
"file": "model.tflite",
"backend": "tflite",
"quantized": false,
"modes": ["predict"]
},
"inputs": [
{
"name": "image",
"type": "image",
"shape": [224,224,3],
"format": "RGB",
"normalize": {
"standard": "[-1,1]"
}
},
],
"outputs": [
{
"name": "classification",
"type": "array",
"shape": [1,1000],
"labels": "labels.txt"
},
]
}
And we can use this model as follows:
UIImage *image = [UIImage imageNamed:@"example-image"];
TIOPixelBuffer *buffer = [[TIOPixelBuffer alloc] initWithPixelBuffer:image.pixelBuffer orientation:kCGImagePropertyOrientationUp];
NSDictionary *inference = (NSDictionary*)[model runOn:buffer];
NSDictionary<NSString*,NSNumber*> *classification = inference[@"classification"];
The full TensorFlow backend supports on-device training with TensorIO. Support for training allows you to deploy a trainable model to a phone and then train it directly on the device with local data. You use the same model.json file to describe the inputs and outputs for training and add a train field that identifies the training ops to run.
Training inputs will usually include both the model inputs and outputs, while the training output will be the loss value you would like to measure. The training ops will be the named operations that are responsible for executing a round of training on the model and will usually include the optimization operation.
Make sure you are using a backend which supports training and have a model with the additional ops required for training. Tell TensorIO that your model targets training with the model.modes field, and add the train field to your model.json:
"model": {
"file": "train",
"backend": "tensorflow",
"modes": ["train"]
},
"inputs": [
...
],
"outputs": [
...
],
"train": {
"ops": [
"training_op_name"
]
}
Unlike inference, which currently runs on a single example, training runs on many examples simultaneously and requires the use of the TIOBatch
API. A batch is simply a collection of training examples whose key-values correspond to the named training inputs expected by the model. Think of a batch as a matrix of training data with each item occupying a row and each column a named column of values for a single input layer across every item.
Instantiate a batch with the input keys to your trainable model. This will typically include both the inputs and labels. Then add items to the batch, typed to TIOBatchItem
but which are really just dictionaries of named values corresponding to the TIOData
protocol:
TIOBatch *batch = [[TIOBatch alloc] initWithKeys:@[@"image", @"labels"]];
[batch addItem:@{
@"image": cat,
@"labels": @(0)
}];
[batch addItem:@{
@"image": dog,
@"labels": @(1)
}];
You can then call train on the model with this batch to execute a single round of training, equivalent to one epoch with a single batch:
NSDictionary *results = (NSDictionary*)[model train:batch];
As with inference, the results dictionary will contain the output of training, typically the loss function you'd like to measure.
To execute multiple epochs of training across many batches, you will need to set up an epoch loop and collect data for the batches yourself. An API to support this common practice is forthcoming.
A trainable cats vs dogs model is included with the full TensorFlow example in this repository. Inside the cats-vs-dogs-train.tiobundle you'll find the expected model.json file along with a train directory that contains the results of exporting a saved model in TensorFlow (more below).
The model.json looks like:
{
"name": "Cats vs Dogs MobileNet V2 1.0 128",
"details": "Cats vs Dogs Kaggle model based on MobileNet V2 architecture with a width multiplier of 1.0 and an input resolution of 128x128.",
"id": "cats-vs-dogs-v2-100-128-unquantized",
"version": "1",
"author": "doc.ai",
"license": "Apache License. Version 2.0 http://www.apache.org/licenses/LICENSE-2.0",
"model": {
"file": "train",
"quantized": false,
"type": "image.classification.catsvsdogs",
"backend": "tensorflow",
"modes": ["train"]
},
"inputs": [
{
"name": "image",
"type": "image",
"shape": [-1,128,128,3],
"format": "RGB",
"normalize": { "standard": "[0,1]" }
},
{
"name": "labels",
"type": "array",
"dtype": "int32",
"shape": [-1,1]
}
],
"outputs": [
{
"name": "sigmoid_cross_entropy_loss/value",
"type": "array",
"shape": [1]
}
],
"train": {
"ops": [
"train"
]
}
}
Notice especially the addition of the train field with its ops parameter and that the shape of the two inputs includes a batch dimension, identified in TensorFlow by a -1
along the first axis. The names of the inputs and of the training op have been taken from the graph, snippets of which are included below. The name of the output comes from an inspection of the exported graph using TensorFlow's saved_model_cli, also below.
Train this model with the TIOBatch
API:
TIOBatch *batch = [[TIOBatch alloc] initWithKeys:@[@"image", @"labels"]];
[batch addItem:@{
@"image": cat,
@"labels": @(0)
}];
[batch addItem:@{
@"image": dog,
@"labels": @(1)
}];
for (NSUInteger epoch = 0; epoch < 100; epoch++) {
NSDictionary *results = (NSDictionary*)[model train:batch];
NSLog(@"%@", results[@"sigmoid_cross_entropy_loss/value"]);
}
Model Snippets
This model was exported from the following code. Notice that the serving_input_receive_fn
provides an input named image, that we are exporting the model using experimental_mode=tf.estimator.ModeKeys.TRAIN
, and that in the model_fn
we set up a placeholder for the labels
and name it labels and name the training op train. The names in the model.json file correspond directly to these values.
This model was built with TensorFlow 1.13.
# trainable model snippets
# service_input_receive_fn used by estimator.export_saved_model
def serving_input_receiver_fn(params):
dimension = [None, params['target_dim'], params['target_dim'], 3]
inputs = {
'image': tf.placeholder(tf.float32, dimension, name='image'),
}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
# the save_model function which is called by a custom python script
# you must have already trained the model for at least a single epoch and generated training checkpoints
# the model_dir param points to that checkpoints directory
def save_model(model_dir, output_dir, dims):
input_params = {'target_dim': dims}
estimator = tf.estimator.Estimator(
model_fn=model.model_fn,
model_dir=model_dir)
estimator.export_saved_model(
output_dir,
lambda:serving_input_receiver_fn(input_params),
as_text=False,
experimental_mode=tf.estimator.ModeKeys.TRAIN)
# the model_fn expected by tensorflow's estimator api
# note: the labels placeholder if labels is None
# note: the named optimization op
def model_fn(features, labels, mode, params):
MOBILENET = 'https://tfhub.dev/google/imagenet/mobilenet_v2_100_128/feature_vector/2'
# build model layers
module = hub.Module(MOBILENET)
feature_vector = module(features["image"])
logits = tf.layers.dense(feature_vector, 1, name='logit')
probabilities = tf.nn.sigmoid(logits, name='sigmoid')
# prepare predictions
predictions = {
'probability': probabilities,
'class': tf.to_int32(probabilities > 0.5)
}
prediction_output = tf.estimator.export.PredictOutput({
'probability': probabilities,
'class': tf.to_int32(probabilities > 0.5)
})
# return an estimator spec for prediction before computing a loss
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
export_outputs={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_output
})
# calculate loss
if labels is None: # during training export
labels = tf.placeholder(tf.int32, shape=(1), name='labels')
labels = tf.reshape(labels, [-1,1])
labels = tf.cast(labels, tf.float32)
loss = tf.losses.sigmoid_cross_entropy(
multi_class_labels=labels,
logits=logits
)
# calculate accuracy metric
accuracy = tf.metrics.accuracy(labels=labels, predictions=predictions["class"], name='accuracy')
if mode == tf.estimator.ModeKeys.TRAIN:
# generate some summary info
# these ops are not supported by the TensorFlow mobile build
# unsupported ops on mobile build
# tf.summary.scalar('average_loss', loss)
# tf.summary.scalar('accuracy', accuracy[1])
# prepare an optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4)
train_op = optimizer.minimize(
loss=loss,
global_step=tf.train.get_global_step(),
name="train")
# return an estimator spec
return tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
train_op=train_op)
if mode == tf.estimator.ModeKeys.EVAL:
# add evaluation metrics
eval_metric_ops = {
"accuracy": accuracy
}
# return an estimator spec
return tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
eval_metric_ops=eval_metric_ops)
We can use tensorflow's saved_model_cli to give us the inputs and outputs to this model. We already know the input is named "image" and we learn that the output corresponds to the sigmoid cross entropy loss, which we use for our model outputs field:
$ saved_model_cli show --dir {export-dir} --all
MetaGraphDef with tag-set: 'train' contains the following SignatureDefs:
signature_def['train']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 128, 128, 3)
name: image:0
The given SavedModel SignatureDef contains the following output(s):
outputs['loss'] tensor_info:
dtype: DT_FLOAT
shape: ()
name: sigmoid_cross_entropy_loss/value:0
Method name is: tensorflow/supervised/training
TensorIO includes a number of additional utilities, especially for working with image data. Until this section is completed, refer to the following files:
- TIOCVPixelBufferHelpers
- UIImage+TIOCVPixelBufferExtensions
- TIOModelOptions
- TIOVisionPipeline
You may also refer to tensorio.info for the complete Objectice-C project documentation.
For an example of TensorIO in action check out Net Runner, our iOS environment for evaluating computer vision machine learning models.