This repo contains scripts and a tool to reproduce the openCL
delegate issue with a sequence of Dense/FullyConnected nodes. Our experiments revealed that if we use a sequence of Dense layers in a special pattern (see the following image), the corresponding tflite version of this model will generate a bunch of nan
and inf
values for certain random indices in certain runs. This issue happens with both FP16 and FP32 tflite versions. This issue can't be reproduced with the XNNPACK
delegate.
-
model_files
folder contains the above-mentioned pattern (sample_model.h5
) and its corresponding tflite versions (sample_model_fp32.tflite
, andsample_model_fp16.tflite
).- You can also use
convert_model.py
to convert this pattern to tflite.
Note:
sample_model.h5
is extracted from a large trained model. - You can also use
We have implemented a small tool to feed a random input to our sample tflite model using openCL
and XNNPACK
delegates. Run the tool multiple times. You will see that for some runs, there are inf
values in the output from the openCL
delegate.
- Linux or Mac host computer
- Connectivity to the target device via adb
- Android NDK, version 22 or later
- CMake 3.18 or later
- Unzip the
tensorflow_lite_cpp_2_10_1_patched_static.zip
file inside thetflite_inference_tool
folder. - In a terminal, from
tflite_inference_tool
folder:
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles"
-DCMAKE_SYSTEM_NAME=Android
-DANDROID_ABI=arm64-v8a
-DANDROID_STL=c++_shared
-DANDROID_NATIVE_API_LEVEL=27
-DCMAKE_VERBOSE_MAKEFILE=ON
-DCMAKE_TOOLCHAIN_FILE=<path-to-ndk>/build/cmake/android.toolchain.cmake
-DCMAKE_BUILD_TYPE=Release
-DTensorFlowLite_ROOT=../tensorflow_lite_cpp_2_10_1_patched_static ..
$ make
-
Here, you must replace with the absolute path of the ndk installed on your computer. If you installed NDK through Android studio, it is typically located at:
/home/<username>/Android/Sdk/ndk/<version>/
on Linux -
tensorflow_lite_cpp_2_10_1_patched_static
is TensorflowFlow Lite library (nightly version) package.
WARNING: This step will write to your /data/local/tmp
folder on device. Please make sure existing files in that folder are backed up as needed.
In a terminal, from tflite_inference_tool
folder:
$ adb push ./build/model_test /data/local/tmp
$ adb push ./model_files /data/local/tmp
To run the tool you can use the FP32 or the FP16 versions.
$ adb shell "cd /data/local/tmp && LD_LIBRARY_PATH=. ./model_test --model_a=model_files/sample_model_fp32.tflite --model_b=model_files/sample_model_fp32.tflite --input_shape=1,81 --output_shape=1,78"
The output should be something like this:
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Initialized TensorFlow Lite runtime.
VERBOSE: Replacing 13 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
VERBOSE: Replacing 13 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 1 partitions.
OpenCL output:
inf, 34048, 6448, inf, -inf, -29936, 10336, -inf, -35360, inf, inf, inf, 40320, inf, inf, inf, inf, 39616, -13680, inf, 52160, inf, inf, inf, -12776, inf, 61440, -9648, -inf, -30272, -26624, -inf, -17568, -31776, -inf, -8600, 43872, -50752, 44416, -27328, 56512, -30656, inf, 14768, 31968, inf, -4944, inf, inf, 29536, 27376, 23056, 33024, 38880, 39232, 35968, 30656, 34112, 5168, 5176, 62944, 30608, inf, -10464, -28832, -43136, -41824, 33536, 37184, inf, inf, -11872, 16296, inf, 47808, inf, 30816, -35232,
xnnpack output:
159092, 34167.3, 6507.01, 30145.7, -16468.5, -30071.8, 10657.3, -213111, -35455.5, 59236.4, 187745, 109088, 40159.7, 189144, 115387, 15455.3, 165877, 39716.5, -13815.6, 211712, 52249.9, 66429.5, 140783, 130572, -12844, 241188, 61518.1, -9514.39, -190658, -30316.6, -26623.5, -140136, -17629.3, -31893.8, -73718.8, -8607.2, 44218.3, -50711.9, 44557.1, -27426.1, 56461.4, -30854.3, 138278, 14813.5, 31880.9, 68724.8, -4587.04, 87265.3, 81735.9, 29943.9, 27338, 23023.4, 33413.6, 38988.4, 39238.6, 35762.5, 30767.5, 34308.8, 4932.98, 5111.61, 63130.2, 30683.3, 75698.6, -10461.2, -28918.4, -43359.6, -42086.3, 33322.9, 37274.8, 91469.1, 107788, -11963.6, 16518.4, 90442.4, 48030.6, 70459, 30802.7, -35454.6,
We have noticed that sometimes the above-mentioned pattern does not lead into wrong results. Therefore, we think in addition to the pattern structure, the values of weight and bias are also influential factors. In model_files/correct_results
you can find a pattern instance that does not have the issue of generating inf
values.
It is worth noting that both of the pattern instances (the one that leads into wrong inf
values and the one that does not have this issue) are extracted from trained models.