Neargye/hello_tf_c_api

batch inference

Closed this issue · 7 comments

Hi @Neargye and thank you for all your great examples. I'm trying to achieve batch classification but with very limited success.

I have a graph that can accept an arbitrary number of images as input (tensor dims = {-1, 32, 32, 3}) and let's say that a patch can be one of five classes. I will call IM_i (with i in [0,4]) the images belonging to those classes.

If I populate the input tensor with data from a single image IM_i the output tensor is correct, its data[i] is the biggest.

I then proceeded to append 2 or more images data in the input tensor's buffer but the output is not correct. I expected to see data[0-4] for the first image, data[5-9] for the second one ecc... but this behavior is only for the first image even though each image is classified correctly if taken in isolation.

But then I experienced something weird. If the input tensor is populated with this order of data:
IM_2, IM_3, IM_x, IM_y, IM_w (x, y, w in [0,4]) data[0-4] will hold the right output values for IM_2 and data[20-24] will hold the output values for IM_3 which made me think that input data shouldn't be simply appended, but I cannot find any documentation for bare C language.

Have you experienced something similar? If so can you provide an example where batch inference is performed?

Thank you in advance!

For example:
input_dimension -> {?, 42}
output_dimension -> {?, 1}

For batch you can try
input_dimension -> {100, 42}
output_dimension -> {100, 1}
and do TF_SessionRun.

@BRTNDR Please look at the examples I hope they will help you.

It turned out I was casting a pointer to the wrong type. Small code, enormous problems.
Thanks!

Xonxt commented

Sorry for reopening this again.

You might remember, I figured out the image inference some time ago with this code, and now I also needed to do batch inference. It isn't working properly, and your example doesn't help.

Here's my code (shortened and simplified version):

std::vector<std::vector<float>> run( const std::vector<cv::Mat>& images ) 
{
    int batchSize = images.size(); 
    int outputSize = 10; // the model should output a vector of 10 elements

    std::vector<TF_Tensor*> input_tensors;
    std::vector<TF_Tensor*> output_tensors;

    // set the input dims
    const std::vector<std::int64_t> input_dims = { batchSize, images.front().rows, images.front().cols, images.front().channels() };

    // input data vector:
    std::vector<float> input_data;

    for ( auto& image : images ) 
    {
        // convert to float32        
        cv::Mat image32f;
        image.convertTo( image32f, CV_32F );
        
        // insert at the end of the input data vector:
        input_data.insert( input_data.end(), (float*) image32f.data, (float*) image32f.data + image32f.total() * image32f.channels() );    
    }

    // set the input tensor
    input_tensors.push_back( TF::CreateTensor( TF_FLOAT, input_dims, input_data ) ); 

    // set the output tensor  
    const std::vector<std::int64_t> output_dims = { batchSize, outputSize };
    output_tensors.push_back( TF::CreateEmptyTensor( TF_FLOAT, output_dims ) );

    // run session:
    const TF_Code code = TF::RunSession( m_pSession, input_ops, input_tensors, output_ops, output_tensors );

    if ( code == TF_OK ) 
    {
        auto output_data = TF::GetTensorsData<float>( output_tensors );

        TF::DeleteTensors( output_tensors );
        TF::DeleteTensors( input_tensors );    

        return output_data;
    }
    else 
    {
        return std::vector<std::vector<float>>();
    }
}

and then run it and show it in the console:

cv::Mat image = cv::imread( "image.jpg", cv::IMREAD_UNCHANGED );

// copy image 4 times to emulate a batch
std::vector<cv::Mat> input_vector;

input_vector.push_back( image );
input_vector.push_back( image );
input_vector.push_back( image );
input_vector.push_back( image );

// process
std::vector<std::vector<float>> output_data = run( input_vector );

// and show:
for ( int i = 0; i < output_data.size(); i++ ) {
    for ( int j = 0; j < output_data[i].size(); j++ ) {
        std::cout << output_data[i][j] << " ";
    }
    std::cout << std::endl;
}

But it only prints out the 10 numbers, as if there was only one input image, instead of a batch.

I even tried pre-allocating the output_tensors not as empty, but with desired size filled with zeros. And if I then print out the TF_TensorByteSize() before and after, I see that the RunSession just overwrites it with the 1-batch size.

Do you perhaps notice any errors in my code that I don't see?

Xonxt commented

UPDATE:
yeah, ok, sorry, it seems I've been using the wrong output layer name. I needed the last layer, but I was giving the second last instead.

It works ok now. You can close the issue back.

Ok, thanks for update!