CppND-Capstone (Tensorflow Custom Op) Project

In this project we will create a custom op that isn't covered by existing Tensorflow Library. Following project is created by following instructions provided by Tensorflow, in the Create an op page.

For further instruction, please refer to Create an op page.

Tensor Addition Op

Addition op is an op that gets two tensors as input and outputs a tensor that is summation of input tensors.

Following is the Functor of Addition Op

template<typename T>
struct AdditionFunctor<CPUDevice,T>{
    void operator()(const CPUDevice& d, const T* input_a, const T* input_b, T* output_c, int N){
        for(int i=0;i<N; i++){
            output_c[i] = input_a[i] + input_b[i];
        }
    }
};

Output tensor c is summation of input tensor a and b.

Project structure

customAdd.cc

→ code of custom Addition op
custom_kernel.cu.cc

→ Specialization for the GPU device defined
custom_kernel.h

→ header file for customAdd op
customAdd.so

→ Shared library created after customAdd.cc is built
customAdd_test.py

→ File for checking whether customAdd op is working properly

Building the op Library

First of all, using python, we will get the header directory and the get_lib directory.

$ python
>>> import tensorflow as tf
>>> tf.sysconfig.get_include()
>>> tf.sysconfig.get_lib()

Compile

Compile with CPU Device

Run following codes to compile custom op into a dynamic library.

TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi.

Compile with GPU Device

Using CUDA kernel to implement op.

nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
  ${TF_CFLAGS[@]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

g++ -std=c++11 -shared -o cuda_op_kernel.so cuda_op_kernel.cc \
  cuda_op_kernel.cu.o ${TF_CFLAGS[@]} -fPIC -lcudart ${TF_LFLAGS[@]}

Testing through python

Compile with testing file in python code.

python customAdd_test.py

When operation is successfully done, you will see

Operation Successful!

Testing Result

Prerequisites

Tensorflow binary
g++
CUDA if running with GPU Device

Project Specification

README

A README with instructions is included with the project
The README indicates which project is chosen.
The README includes information about each rubric point addressed.

Compiling and Testing

The submission must compile and run.

Loops, Functions, I/O

The project demonstrates an understanding of C++ functions and control structures.
The project reads data from a file and process the data, or the program writes data to a file.

Object Oriented Programming

The project uses Object Oriented Programming techniques.
Classes abstract implementation details from their interfaces.
Classes encapsulate behavior.
Classes follow an appropriate inheritance hierarchy.
Overloaded functions allow the same function to operate on different parameters.
Derived class functions override virtual base class functions.
Templates generalize functions in the project.

KibaekJeong/Cpp---Capstone