[Performance Issue]: 125x performance difference between Python and TensorFlow.NET

Question

[Performance Issue]: 125x performance difference between Python and TensorFlow.NET

Closed this issue 2 years ago · 5 comments

Brief Description

I've written a basic program which loads an object detection model and runs it 20 times. I'm seeing iteration times of about 50msec for the Python version, but the TensorFlow.NET version takes around 6600msec.

I'm sure I'm doing something very wrong here as I'm inexperienced with ML. But the code samples are so small I'm not sure where I could have gone wrong.

Previously posted in the Discord but there doesn't seem to be much activity in there.

Device and Context

12th Gen Intel(R) Core(TM) i5-12400F 2.50 GHz
32.0 GB (31.8 GB usable)

Benchmark

Python version:

import time
from tensorflow.lite.python.interpreter import Interpreter

interpreter = Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()

for _ in range(20):
    start = time.time()
    interpreter.invoke()
    end = time.time()
    print("Invoked in " + str((end - start) * 1000) + "ms")

C# version:

using System.Diagnostics;
using Tensorflow;
using Tensorflow.Lite;

void errorHandler(bool condition, string message)
{
    if (condition)
    {
        throw new Exception(message);
    }
}

using var model = c_api_lite.TfLiteModelCreateFromFile("model.tflite");
errorHandler(model.IsInvalid, "Failed to load the model.");

using var interpreter = c_api_lite.TfLiteInterpreterCreate(model, new SafeTfLiteInterpreterOptionsHandle(IntPtr.Zero));

var allocationResult = c_api_lite.TfLiteInterpreterAllocateTensors(interpreter);
errorHandler(allocationResult != TfLiteStatus.kTfLiteOk, "Failed to allocate tensors.");

var stopwatch = new Stopwatch();

var input = c_api_lite.TfLiteInterpreterGetInputTensor(interpreter, 0);

for (var run = 0; run < 20; run++)
{
    stopwatch.Restart();
    c_api_lite.TfLiteInterpreterInvoke(interpreter);
    stopwatch.Stop();

    Debug.WriteLine($"Invoked in {stopwatch.ElapsedMilliseconds}ms");
}

model.zip

Alternatives

No response

Answer 1 · 2024-04-16T20:18:48.000Z

What version of tensorflow (python) are you using?

Answer 2 · 2024-04-16T20:31:27.000Z

Hello! TensorFlow 2.15.0, Python 3.9.13. Thanks.

Answer 3 · 2024-04-16T20:39:59.000Z

I think this result might be expected because the binding for Tensorflow.Lite in tf.net was implemented when tf2.0 even hadn't come out. Could you please run the same benchmark with tensorflow 1.x (or tf2.0.0 if any problem) if you'd like to dig further?

Answer 4 · 2024-04-16T21:50:47.000Z

Took me a while but I've got it running on TensorFlow 1.15.0 and Python 3.7.9 with the same Python script and I can confirm that it's massively slow like the C# example.

I thought I was using 2.x.x given this package description but I guess not!

Thank you for the help, this has pointed me in the right direction - as I'm only using a small portion of the API I'll see if I can find the right DLL and do a custom binding for it for my own purposes.

Answer 5 · 2024-04-17T03:20:00.000Z

I thought I was using 2.x.x given this package description but I guess not!

At the beginning, tensorflow.net was implemented to bind tensorflow 1.x because 2.x hadn't come out at that time. After tensorflow upgrading to 2.x, we upgraded the most features in tf.net to use 2.x backend, but not the binding for tf.lite, due to the short of hands. Tensorflow.Redist only provides some DLLs, which are compiled from tensorflow c++ library. Using the newly added C API requires some changes in Tensorflow.Binding (C# code, here's a reference).

Thank you for the help, this has pointed me in the right direction - as I'm only using a small portion of the API I'll see if I can find the right DLL and do a custom binding for it for my own purposes.

Glad to hear that. In this way I think Tensorflow.Redist could still work for you, while you need to write some C# code yourself to make the binding.