Here's a scenario that I believe some non-data engineers or data scientists are confronted with.
How do I deliver a Tensorflow model that I trained in Python but deploy in pure C/C++ code on the client side without setting up a Python environment on their side, and on top of that, all files have to be binaries?
The answer to that is to use the Tensorflow C or C++ API. In this article, we only look at how to use the C API (not the C++/TensorflowLite) that runs only on the CPU.
You would think that the famous Tensorflow would have documentation about how to compile a simple C solution with Tensorflow, but as of now (TF2.1), there is little to no information about that. I'm here to share my findings.
This article will explain how to run a common C programme using Tensorflow's C API 2.1. The environment that I will use throughout the article is as follows:
- OS : Linux ( Tested and worked on un fresh Ubuntu 19.10/OpenSuse Tumbleweed)
- Latest GCC
- Tensorflow from Github (master branch 2.1)
- No GPU
Also, I would like to credit Vlad Dovgalecs and his article at Medium, as this tutorial is largely based on and improved upon his findings.
This article will be a bit lengthy. But here is what we will do, step by step:
- Clone Tensorflow source code and compile to get the C API headers and binaries.
- Build the simpliest model using Python and Tensorflow and export it as a TF model that can be read by the C API.
- Build a simple C program, compile it with "gcc," and run it like a normal execution file.
So here we go:
As far as I know, there are two ways to get those C API headers.
- Download the precompiled Tensorflow C API from the website (binaries may not be up to date).OR
- Clone and compile from source code (a time-consuming process, but if things don't work, we can debug and examine the API).
So I'm going to show how to compile their code and use their binaries.
create a folder and clone the project
git clone https://github.com/tensorflow/tensorflow.git
You would need Bazel to compile. Install it on your environment
Ubuntu :
sudo apt update && sudo apt install bazel-1.2.1
OpenSuse :
sudo zypper install bazel
Whichever platform you use, make sure the Bazel version is 1.2.1, as this is what Tensorflow 2.1 is currently using. This could change in the future.
Next, we would need to instal Numpy
Python's package (why would we need a Python package to build a C API?). You can instal it however you want, as long as it can be referenced back during compilation. But I prefer to instal it through Miniconda and have a separate virtual environment for the build. Here's how:
Install Miniconda :
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo chmod 777 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
# follow the default installation direction
Create a new environtment + Numpy named tf-build:
conda create -n tf-build python=3.7 numpy
we use this environtment later in step D.
Tensorflow 2.1 source code has a bug that will make you fail to build it. Refer to this issue. The fix is to apply a patch here. I included a file in this repository that can be used as the patch.
# copy/download the "p.patch" file from my repo and past at the root of Tensorflow source code.
git apply p.patch
In future this might be fixed and not relevant.
By referring to the Tensorflow documentation and github Readme. Here's how we compile it. We need to activate out conda env first for it refer to Numpy
conda activate tf-build # skip this if you already have numpy installed globally
# make sure you're at the root of the Tensorflow source code.
bazel test -c opt tensorflow/tools/lib_package:libtensorflow_test # note that this will take very long to compile
bazel build -c opt tensorflow/tools/lib_package:libtensorflow_test
Let me WARN you again. It takes 2 hours to compile on a VM with Ubuntu in a 6-core configuration. My friend with a 2-core laptop basically froze trying to compile this. Here is an advice. Run on a server with a powerful CPU and RAM.
copy the file at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz
and paste it to your desired folder. Untar it as follows:
tar -C /usr/local -xzf libtensorflow.tar.gz
I untar it at my home folder instead of at /usr/local
as I was just trying it out.
CONGRATULATION!! YOU MADE IT. at least for compiling tensorflow.
In this step, we will build a model with the tf.keras.layers
class and save it to be loaded later with the C API. Refer to the full code at model.py
in the repo.
here is simple model where is has a custom tf.keras.layers.Model
, with single dense
layer. Which is initialized with ones
. Hence the output of this model (from the def call()
) will produce an output that is similar to the input.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class testModel(tf.keras.Model):
def __init__(self):
super(testModel, self).__init__()
self.dense1 = tf.keras.layers.Dense(1, kernel_initializer='Ones', activation=tf.nn.relu)
def call(self, inputs):
return self.dense1(inputs)
input_data = np.asarray([[10]])
module = testModel()
module._set_inputs(input_data)
print(module(input_data))
# Export the model to a SavedModel
module.save('model', save_format='tf')
Eversince Tensorflow 2.0, Eager execution allow us to run a model without drafting the graph and run through session
. But in order to save the model ( refer to this line module.save('model', save_format='tf')
), the graph needs to be built before it can be saved. Hence, we will need to call the model at least once for it to create the graph. Calling print(module(input_data))
will force it to create the graph.
Next run the code:
python model.py
You should get an output as below:
2020-01-30 11:46:25.400334: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-01-30 11:46:25.421717: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699495000 Hz
2020-01-30 11:46:25.422615: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561bef5ac2a0 executing computations on platform Host. Devices:
2020-01-30 11:46:25.422655: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2020-01-30 11:46:25.422744: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
tf.Tensor([[10.]], shape=(1, 1), dtype=float32)
A folder called model
should also be created.
When we save a model, it will create a folder with a bunch of files inside it. It basically stores the weights and the graphs of the model. Tensorflow includes a tool for diving into these files and matching the input and output tensors. It is called saved_model_cli
. It is a command line tool that comes together when you install Tensorflow.
BUT WAIT!, we haven't install tensorflow !!. so basicly there is two way to get saved_model_cli
- Install tensorflow
- Build from source code and looks for
saved_model_cli
for this I will just install tensorflow in seperate conda environment and call it there, we only need to use it once anyway. so here we go
Install tensorflow in seperate conda environment :
conda create -n tf python=3.7 tensorflow
Activate the environment:
conda activate tf
by now you should be able to call saved_model_cli
through command line.
We would need to extract the graph names for the input and output tensors and use that information later when calling the C API. Here's how:
saved_model_cli show --dir <path_to_saved_model_folder>
running this and replaced the appropriate path, you should get an output like below:
The given SavedModel contains the following tag-sets:
serve
use this tag-set to further drill into the tensor graph, here's how:
saved_model_cli show --dir <path_to_saved_model_folder> --tag_set serve
and you should get an output like below:
The given SavedModel MetaGraphDef contains SignatureDefs with the following keys:
SignatureDef key: "__saved_model_init_op"
SignatureDef key: "serving_default"
using serving_default
signature key into command to print out the tensor node:
saved_model_cli show --dir <path_to_saved_model_folder> --tag_set serve --signature_def serving_default
and you should get an output like below:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_1'] tensor_info:
dtype: DT_INT64
shape: (-1, 1)
name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
here we would need the name serving_default_input_1
and StatefulPartitionedCall
later to be use in the C API.
Third part is to write the C code that use the Tensorflow C API and import the Python saved model. The full code can be refer at here.
There is no C API proper documentation, so if something went wrong, it's best to look back at ther C header in the source code (You can also debug using GDB and step by step learn how the C header works)
On empty file, import the tensorflow C API as follow:
#include <stdlib.h>
#include <stdio.h>
#include "tensorflow/c/c_api.h"
void NoOpDeallocator(void* data, size_t a, void* b) {}
int main()
{
}
Note that you have NoOpDeallocator
void function declared, we will use it later
Next need to load the savedmodel and the session using TF_LoadSessionFromSavedModel
API.
//********* Read model
TF_Graph* Graph = TF_NewGraph();
TF_Status* Status = TF_NewStatus();
TF_SessionOptions* SessionOpts = TF_NewSessionOptions();
TF_Buffer* RunOpts = NULL;
const char* saved_model_dir = "model/"; // Path of the model
const char* tags = "serve"; // default model serving tag; can change in future
int ntags = 1;
TF_Session* Session = TF_LoadSessionFromSavedModel(SessionOpts, RunOpts, saved_model_dir, &tags, ntags, Graph, NULL, Status);
if(TF_GetCode(Status) == TF_OK)
{
printf("TF_LoadSessionFromSavedModel OK\n");
}
else
{
printf("%s",TF_Message(Status));
}
Next we grab the tensor node from the graph by their name. Remember earlier we search for tensor name using saved_model_cli
?. here where we use it back when we call TF_GraphOperationByName()
. In this example, serving_default_input_1
is our input tensor and StatefulPartitionedCall
is out output tensor.
//****** Get input tensor
int NumInputs = 1;
TF_Output* Input = malloc(sizeof(TF_Output) * NumInputs);
TF_Output t0 = {TF_GraphOperationByName(Graph, "serving_default_input_1"), 0};
if(t0.oper == NULL)
printf("ERROR: Failed TF_GraphOperationByName serving_default_input_1\n");
else
printf("TF_GraphOperationByName serving_default_input_1 is OK\n");
Input[0] = t0;
//********* Get Output tensor
int NumOutputs = 1;
TF_Output* Output = malloc(sizeof(TF_Output) * NumOutputs);
TF_Output t2 = {TF_GraphOperationByName(Graph, "StatefulPartitionedCall"), 0};
if(t2.oper == NULL)
printf("ERROR: Failed TF_GraphOperationByName StatefulPartitionedCall\n");
else
printf("TF_GraphOperationByName StatefulPartitionedCall is OK\n");
Output[0] = t2;
Next we will need to allocate the new tensor locally using TF_NewTensor
, set the input value and later we will pass to session run. NOTE that ndata
is total byte size of your data, not lenght of the array
Here we set the input tensor with value of 20. and we should see the output value as 20 as well.
//********* Allocate data for inputs & outputs
TF_Tensor** InputValues = (TF_Tensor**)malloc(sizeof(TF_Tensor*)*NumInputs);
TF_Tensor** OutputValues = malloc(sizeof(TF_Tensor*)*NumOutputs);
int ndims = 2;
int64_t dims[] = {1,1};
int64_t data[] = {20};
int ndata = sizeof(int64_t); // This is tricky, it number of bytes not number of element
TF_Tensor* int_tensor = TF_NewTensor(TF_INT64, dims, ndims, data, ndata, &NoOpDeallocator, 0);
if (int_tensor != NULL)
{
printf("TF_NewTensor is OK\n");
}
else
printf("ERROR: Failed TF_NewTensor\n");
InputValues[0] = int_tensor;
Next we can run the model by invoking TF_SessionRun
API. Here's how:
// //Run the Session
TF_SessionRun(Session, NULL, Input, InputValues, NumInputs, Output, OutputValues, NumOutputs, NULL, 0,NULL , Status);
if(TF_GetCode(Status) == TF_OK)
{
printf("Session is OK\n");
}
else
{
printf("%s",TF_Message(Status));
}
// //Free memory
TF_DeleteGraph(Graph);
TF_DeleteSession(Session, Status);
TF_DeleteSessionOptions(SessionOpts);
TF_DeleteStatus(Status);
Lastly, we want get back the output value from the output tensor using TF_TensorData
that extract data from the tensor object. Since we know the size of the output which is 1, i can directly print it. Else use TF_GraphGetTensorNumDims
or other API that is available in c_api.h
or tf_tensor.h
void* buff = TF_TensorData(OutputValues[0]);
float* offsets = buff;
printf("Result Tensor :\n");
printf("%f\n",offsets[0]);
return 0;
Compile it as below:
gcc -I<path_of_tensorflow_api>/include/ -L<path_of_tensorflow_api>/lib main.c -ltensorflow -o main.out
Before you run it. You'll need to make sure the C library is exported in your environment
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_of_tensorflow_api>/lib
RUN IT
./main.out
You should get an output like below. Notice that the output value is 20 like out input. you can change the model and initiliaze the kernel with weight of value 2 and see if it reflected to other value.
2020-01-31 09:47:48.842680: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: model/
2020-01-31 09:47:48.844252: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-01-31 09:47:48.844295: I tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: model/
2020-01-31 09:47:48.844385: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-01-31 09:47:48.859883: I tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-01-31 09:47:48.908997: I tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: model/
2020-01-31 09:47:48.923127: I tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 80457 microseconds.
TF_LoadSessionFromSavedModel OK
TF_GraphOperationByName serving_default_input_1 is OK
TF_GraphOperationByName StatefulPartitionedCall is OK
TF_NewTensor is OK
Session is OK
Result Tensor :
20.000000
END