tensorflow/tensorrt

Mismatch in output: Native TF vs TF-TRT FP32

stengoes opened this issue · 4 comments

I am experiencing a big accuracy drop when comparing our model in native TF (~88% accuracy) vs a TF-TRT FP32 optimized version (~64% accuracy). Am I missing something or could this be a bug?

Explaining the model:
Our model is a bit different than most models because we take 6 different images as input instead of just 1. Our model takes 6 images as input and outputs softmax probabilities (20 categories). So we have a resnet-34-like architecture which extracts features each of the 6 images, features are concatenated and then a fully connected layer does the final classification.

Our system:
GPU: Tesla T4
Driver: 440.33.01
Cuda: 10.2
TF 1.15.0 (using docker)
TensorRT 5.1.5 (installed in the docker image)
docker image: tensorflow/tensorflow:1.15.0-gpu-py3-jupyter

dpkg -l | grep libnvinfer
ii  libnvinfer5                   5.1.5-1+cuda10.0                  amd64        TensorRT runtime libraries

Code used to optimize frozen graph:

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

# Load frozen graphdef
with tf.gfile.GFile("/models/model69.pb", 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

# Create optimizer
converter = trt.TrtGraphConverter(
    input_graph_def=graph_def,
    nodes_blacklist=["output"],
    session_config=None,
    max_batch_size=1,
    max_workspace_size_bytes=6*10**9, # 6 GB
    precision_mode="FP32",
    minimum_segment_size=3,
    is_dynamic_op=False,
    maximum_cached_engines=1,
    use_calibration=True             
)    

# Do optimalization
optimized_graph = converter.convert()

# Save optimized frozen graph
with tf.gfile.GFile("/models/model69-FP32-optimized.pb", "wb") as f:
    f.write(optimized_graph.SerializeToString())

Code used to compare the models:

import tensorflow as tf

def predict(filepath_model, images):
    
    # Just to be sure clear the graph
    tf.reset_default_graph()

    graph = tf.Graph()
    with tf.Session(graph=graph) as sess:

        # Load frozen graphdef
        with tf.gfile.GFile(filepath_model, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())

        # Import graphdef
        tf.import_graph_def(graph_def, name="")

        # Get input/output tensors
        input0 = graph.get_tensor_by_name("input_pipeline/input_0:0")    
        input1 = graph.get_tensor_by_name("input_pipeline/input_1:0")    
        input2 = graph.get_tensor_by_name("input_pipeline/input_2:0")    
        input3 = graph.get_tensor_by_name("input_pipeline/input_3:0")    
        input4 = graph.get_tensor_by_name("input_pipeline/input_4:0")    
        input5 = graph.get_tensor_by_name("input_pipeline/input_5:0")   
        output = graph.get_tensor_by_name("output:0")

        # Do prediction
        probabilities = sess.run(
            output, 
            feed_dict={
                input0 : images[0],
                input1 : images[1],
                input2 : images[2],
                input3 : images[3],
                input4 : images[4],
                input5 : images[5],
            }
        )
        score = probabilities.max()
        label = probabilities.argmax()
        
        return label, score, probabilities

# Load examples from disk
import pickle
with open("examples.pkl", "rb") as f:
    examples = pickle.load(f)

# Loop over the examples
print("Native TF \t TF-TRT FP32")
for example in examples: 
    
    # Predict same example on both TF native and TF-trt optimized models
    label1, score1, probabilities1 = predict("/models/model69.pb", example)
    label2, score2, probabilities2 = predict("/models/model69-FP32-optimized.pb", example)
    
    # Print results
    if label1 != label2:
        print("{} ({:.2f}%) \t {} ({:.2f}%) \t MISMATCH!!".format(label1, score1*100, label2, score2*100))
    else:        
        print("{} ({:.2f}%) \t {} ({:.2f}%)".format(label1, score1*100, label2, score2*100))

The above code has the following output:

Native TF 	 TF-TRT FP32
12 (100.00%) 	 12 (100.00%)
12 (100.00%) 	 12 (100.00%)
15 (92.73%) 	 15 (99.70%)
12 (98.45%) 	 12 (100.00%)
3 (97.02%) 	 3 (99.74%)
12 (99.81%) 	 12 (100.00%)
12 (100.00%) 	 12 (100.00%)
11 (99.92%) 	 11 (80.92%)
2 (96.95%) 	 1 (53.15%) 	 MISMATCH!!
12 (99.99%) 	 12 (100.00%)
8 (100.00%) 	 8 (100.00%)
5 (53.45%) 	 17 (45.35%) 	 MISMATCH!!
6 (100.00%) 	 5 (75.59%) 	 MISMATCH!!
12 (100.00%) 	 12 (100.00%)
1 (95.47%) 	 1 (90.35%)
13 (99.61%) 	 13 (100.00%)
12 (100.00%) 	 12 (100.00%)
4 (100.00%) 	 4 (99.95%)
8 (100.00%) 	 8 (100.00%)
6 (99.99%) 	 5 (33.04%) 	 MISMATCH!!
3 (93.78%) 	 6 (99.73%) 	 MISMATCH!!
2 (96.58%) 	 5 (95.05%) 	 MISMATCH!!
5 (76.41%) 	 5 (87.24%)
2 (99.71%) 	 2 (99.86%)
1 (93.94%) 	 0 (96.13%) 	 MISMATCH!!
12 (100.00%) 	 12 (100.00%)
4 (99.99%) 	 4 (99.53%)
15 (94.14%) 	 15 (97.18%)
5 (97.86%) 	 5 (69.65%)
1 (46.04%) 	 5 (99.40%) 	 MISMATCH!!
5 (59.42%) 	 2 (87.30%) 	 MISMATCH!!
12 (100.00%) 	 12 (100.00%)
5 (99.65%) 	 5 (69.83%)
1 (63.76%) 	 1 (62.01%)
5 (85.53%) 	 5 (98.04%)
8 (98.48%) 	 1 (66.45%) 	 MISMATCH!!
13 (87.91%) 	 13 (98.45%)
12 (97.51%) 	 12 (100.00%)
5 (99.85%) 	 5 (99.99%)
12 (100.00%) 	 12 (99.98%)
5 (99.59%) 	 13 (94.90%) 	 MISMATCH!!
15 (97.62%) 	 12 (100.00%) 	 MISMATCH!!
3 (81.02%) 	 3 (80.31%)
8 (100.00%) 	 8 (99.93%)
6 (99.85%) 	 5 (54.45%) 	 MISMATCH!!
12 (100.00%) 	 12 (100.00%)
2 (47.00%) 	 5 (78.62%) 	 MISMATCH!!
5 (90.47%) 	 5 (99.79%)
6 (99.64%) 	 3 (59.76%) 	 MISMATCH!!
5 (99.75%) 	 5 (99.89%)
13 (99.99%) 	 13 (100.00%)

The same exact code example does result in the same and correct outputs in a docker container with image nvcr.io/nvidia/tensorflow:19.11-tf1-py3.

So it is either a bug in TensorRT 5.1.5 or cudnn 7.6.2 which is solved in TensorRT 6.0.0 or cudnn 7.6.5.
Since they are the only differences between tensorflow/tensorflow:1.15.0-gpu-py3-jupyter and nvcr.io/nvidia/tensorflow:19.11-tf1-py3.

@stengoes hellow, after you have saved optimized model, when you load the optimized model again, whether cost too much time ? i test yolov3 on xavier, which cost 12 minutes !!! do you have same problem ? hope for your answer, thank you

@stengoes besides,26G memory was consumed !!! do you have this phenomenon ?

@stengoes hellow, after you have saved optimized model, when you load the optimized model again, whether cost too much time ? i test yolov3 on xavier, which cost 12 minutes !!! do you have same problem ? hope for your answer, thank you

The script was not meant to be optimal in terms of performance.

For each example it first loads the TF native model, then evaluates the example, and then unloads the model. Then it does the same for the TensorRT optimized model. Finally it compares the output values of both models and the process repeats for the next example.

This procedure is by no means optimal, but for me it was just a quick and dirty way to make absolutely sure that the output values are not accidentally influenced by the wrong model.

Loading and unloading of the models takes a lot of time (also depending on hardware and model size). Also you could be running a lot of examples, which means the loading the "examples.pkl" could take a lot of time? So I can understand your run time of 12 minutes. Anyway a much more optimal way would be to load each model model once and run all examples in batch mode and compare the output values of the two batches. In order to achieve this you would only need to modify the script a little bit.

@stengoes besides,26G memory was consumed !!! do you have this phenomenon ?

Hard to tell where this is coming from based on your info. Could be that your "examples.pkl" is really large?