batch inference
Closed this issue · 7 comments
Hi @Neargye and thank you for all your great examples. I'm trying to achieve batch classification but with very limited success.
I have a graph that can accept an arbitrary number of images as input (tensor dims = {-1, 32, 32, 3}) and let's say that a patch can be one of five classes. I will call IM_i (with i in [0,4]) the images belonging to those classes.
If I populate the input tensor with data from a single image IM_i the output tensor is correct, its data[i] is the biggest.
I then proceeded to append 2 or more images data in the input tensor's buffer but the output is not correct. I expected to see data[0-4] for the first image, data[5-9] for the second one ecc... but this behavior is only for the first image even though each image is classified correctly if taken in isolation.
But then I experienced something weird. If the input tensor is populated with this order of data:
IM_2, IM_3, IM_x, IM_y, IM_w (x, y, w in [0,4]) data[0-4] will hold the right output values for IM_2 and data[20-24] will hold the output values for IM_3 which made me think that input data shouldn't be simply appended, but I cannot find any documentation for bare C language.
Have you experienced something similar? If so can you provide an example where batch inference is performed?
Thank you in advance!
For example:
input_dimension -> {?, 42}
output_dimension -> {?, 1}
For batch you can try
input_dimension -> {100, 42}
output_dimension -> {100, 1}
and do TF_SessionRun.
Example batch inference: https://github.com/Neargye/hello_tf_c_api/blob/master/src/batch_interface.cpp
It turned out I was casting a pointer to the wrong type. Small code, enormous problems.
Thanks!
Sorry for reopening this again.
You might remember, I figured out the image inference some time ago with this code, and now I also needed to do batch inference. It isn't working properly, and your example doesn't help.
Here's my code (shortened and simplified version):
std::vector<std::vector<float>> run( const std::vector<cv::Mat>& images )
{
int batchSize = images.size();
int outputSize = 10; // the model should output a vector of 10 elements
std::vector<TF_Tensor*> input_tensors;
std::vector<TF_Tensor*> output_tensors;
// set the input dims
const std::vector<std::int64_t> input_dims = { batchSize, images.front().rows, images.front().cols, images.front().channels() };
// input data vector:
std::vector<float> input_data;
for ( auto& image : images )
{
// convert to float32
cv::Mat image32f;
image.convertTo( image32f, CV_32F );
// insert at the end of the input data vector:
input_data.insert( input_data.end(), (float*) image32f.data, (float*) image32f.data + image32f.total() * image32f.channels() );
}
// set the input tensor
input_tensors.push_back( TF::CreateTensor( TF_FLOAT, input_dims, input_data ) );
// set the output tensor
const std::vector<std::int64_t> output_dims = { batchSize, outputSize };
output_tensors.push_back( TF::CreateEmptyTensor( TF_FLOAT, output_dims ) );
// run session:
const TF_Code code = TF::RunSession( m_pSession, input_ops, input_tensors, output_ops, output_tensors );
if ( code == TF_OK )
{
auto output_data = TF::GetTensorsData<float>( output_tensors );
TF::DeleteTensors( output_tensors );
TF::DeleteTensors( input_tensors );
return output_data;
}
else
{
return std::vector<std::vector<float>>();
}
}
and then run it and show it in the console:
cv::Mat image = cv::imread( "image.jpg", cv::IMREAD_UNCHANGED );
// copy image 4 times to emulate a batch
std::vector<cv::Mat> input_vector;
input_vector.push_back( image );
input_vector.push_back( image );
input_vector.push_back( image );
input_vector.push_back( image );
// process
std::vector<std::vector<float>> output_data = run( input_vector );
// and show:
for ( int i = 0; i < output_data.size(); i++ ) {
for ( int j = 0; j < output_data[i].size(); j++ ) {
std::cout << output_data[i][j] << " ";
}
std::cout << std::endl;
}
But it only prints out the 10 numbers, as if there was only one input image, instead of a batch.
I even tried pre-allocating the output_tensors
not as empty, but with desired size filled with zeros. And if I then print out the TF_TensorByteSize()
before and after, I see that the RunSession
just overwrites it with the 1-batch size.
Do you perhaps notice any errors in my code that I don't see?
UPDATE:
yeah, ok, sorry, it seems I've been using the wrong output layer name. I needed the last layer, but I was giving the second last instead.
It works ok now. You can close the issue back.
Ok, thanks for update!