Output values are not changing for different inputs
vmelentev opened this issue ยท 13 comments
Hi, I am using a movenet model from tfhub.dev with FrameProcessor and VisionCamera to try and apply human pose estimation to a person. It doesn't appear as though it is tracking my movements as the outputs in the console are always the same. This appears to be the case with all models I try to use.
Here is the code I am using to resize the frame:
function getArrayFromCache(size) {
'worklet'
if (global[CACHE_ID] == null || global[CACHE_ID].length !== size) {
global[CACHE_ID] = new Uint8Array(size);
}
return global[CACHE_ID];
}
function resize(frame, width, height) {
'worklet'
const inputWidth = frame.width;
const inputHeight = frame.height;
const arrayData = frame.toArrayBuffer();
const outputSize = width * height * 3; // 3 for RGB
const outputFrame = getArrayFromCache(outputSize);
for (let y = 0; y < height; y++) {
for (let x = 0; x < width; x++) {
// Find closest pixel from the source image
const srcX = Math.floor((x / width) * inputWidth);
const srcY = Math.floor((y / height) * inputHeight);
// Compute the source and destination index
const srcIndex = (srcY * inputWidth + srcX) * 4; // 4 for BGRA
const destIndex = (y * width + x) * 3; // 3 for RGB
// Convert from BGRA to RGB
outputFrame[destIndex] = arrayData[srcIndex + 2]; // R
outputFrame[destIndex + 1] = arrayData[srcIndex + 1]; // G
outputFrame[destIndex + 2] = arrayData[srcIndex]; // B
}
}
return outputFrame;
}
Here is my frame processor function:
const frameProcessor = useFrameProcessor((frame) => {
'worklet'
if (model == null) return
const newFrame = resize(frame, 192, 192)
const outputs = model.runSync([newFrame])
outputs = outputs[0]
console.log(outputs[1])
}, [model])
Here is the output in the console:
LOG 0.46377456188201904
LOG 0.46377456188201904
LOG 0.46377456188201904
LOG 0.46377456188201904
LOG 0.46377456188201904
LOG 0.46377456188201904
For each frame the camera sees the result is always the same.
Does anyone know how to resolve this issue?
Thank you
Please format your code properly.
I had a similar issue - in my case, the input size didn't match what the model was expecting. I'd also check that the model accepts uint8 input.
You can verify on https://netron.app
I had a similar issue - in my case, the input size didn't match what the model was expecting. I'd also check that the model accepts uint8 input. You can verify on https://netron.app
Hi, the frame input size and type (uint8) is correct. If it weren't, I wouldn't get console outputs above and I would get errors such as 'Invalid input size/type'.
My issue is that the output is not changing regardless of the input. If I understand correctly this model is meant to detect different features of the human body (nose, eyes, elbows, knees ect) and output values based on where they appear on the screen, which doesn't appear to be the case as the output values are always the same.
Does your newFrame
contain new data each time?
Hi! Seemingly have the same problem. The resized image does change, however not the output of the tflite model.
I get the same when running your /example in this repo with the following output:
LOG Result: 25
LOG Running inference on 640 x 480 yuv Frame
LOG Result: 25
LOG Running inference on 640 x 480 yuv Frame
LOG Result: 25
LOG Running inference on 640 x 480 yuv Frame
LOG Result: 25
LOG Running inference on 640 x 480 yuv Frame
LOG Result: 25
LOG Running inference on 640 x 480 yuv Frame
LOG Result: 25
LOG Running inference on 640 x 480 yuv Frame
LOG Result: 25
...
Well if the resized image changes but the output values don't then it might be an issue with your TFLite model? I am not sure if this is an issue here in this library...
Ok, I can confirm it was an issue with the input size as @willadamskeane suggested. For some reason, it does not output an error on wrong input size (e.g. 151x150 instead of 150x150 px using the vision-camera-resize-plugin).
If this is considered expected behaviour, from my end the issue can be closed.
Hi all, after some experimentation it appears as though my code for resizing the frame does not work properly and does not put the frame into the correct format, yet it wasn't throwing an error for some reason. I have resolved this issues by switching to using the vision-camera-resize-plugin which @Silvan-M suggested and it now works. Thank you for your help
@Silvan-M @s54BU Do either of you mind sharing your working code? I'm encountering the same behavior where the frame is updating but the results aren't. I've been using this model which should be the same as yours and have already been using vision-camera-resize-plugin.
My code is more or less as follows:
// ... imagine some model loading code here, poseModel is set in state somewhere
const poseModel = await loadTensorflowModel(
require("../assets/movenet_multipose.tflite")
);
const maxPixels = 512;
// The longer side of the frame is resized to maxPixels, while maintaining the aspect ratio of the original frame.
let width, height;
if (frame.width > frame.height) {
width = maxPixels;
height = (frame.height / frame.width) * maxPixels;
} else {
height = maxPixels;
width = (frame.width / frame.height) * maxPixels;
}
// Resize the frame for more efficient inference
const resized = resize(frame, {
scale: {
width: width,
height: height
},
pixelFormat: "rgb",
dataType: "uint8"
});
const inference = poseModel.runSync([resized]);
The expected input according to the model page should be in the shape [1, height, width, 3]
, according to the documentation, but the data output by resize()
being pushed into poseModel.runSync()
is [1, height * width * 3]
@JEF1056, sure no problem! But I basically just used the example of the repo and changed it in such a way, that I can use as standalone application (without outer library, but library as import).
Also I don't use movenet, I used efficientdet. Looking at movenet multipose I find it interesting that they don't require a specific input size (only a multiple of 32) and when putting it into netron I get 1x1x1x3
as input (see screenshot). Not sure how this works, maybe someone else here has an idea if this works with this plugin.
You mentioned that the plugin returns [1, height * width * 3]
this shouldn't be a problem since that's also the case for the example, which has [1, 320, 320, 3]
and it seems to work well.
Your code looks good, however I could see a problem, since they require a width and height of a multiple of 32. In your code this is only given for the larger image side, the other side is likely not a multiple of 32, so make sure that both width and height are a multiple of 32.
My example (adapted from /example): example-tflite.zip
After taking a look it seems that the 1x1x1x3
refers to a dynamic shape in tensorflow lite (e.g. 1 x null x null x 3) which would require you to resize the input shape first. It seems there was an issue thread here around adding support for this kind of behavior in this library but no specific API was made available for it
Unfortunately I don't have the C++ / native code experience to write a cohesive API around TfLiteInterpreterResizeInputTensor
specifically for this libary- I might take a shot at writing some kind of wrapper with the patch in that PR, any thoughts? @mrousavy (would be happy to sponsor you to get a small change for this in)
Hey - yea I can add automatic tensor resizing if you tell me when that method needs to be called. Should be like 4-8 hours of effort max.
Thanks for the quick response; It needs to get called after model loading but before memory allocation and model inference, so likely just before this line: https://github.com/mrousavy/react-native-fast-tflite/blob/main/cpp/TensorflowPlugin.cpp#L171
The best way to expose it as an API would probably be
loadTensorflowModel(source: ModelSource, delegate?: TensorflowModelDelegate, inputShape?: number[])
Where if inputShape is undefined, resize isn't called at all
Alternatively, the entire memory allocation could be done after the inputBuffer is created, like just before this line:
https://github.com/mrousavy/react-native-fast-tflite/blob/main/cpp/TensorflowPlugin.cpp#L236
Though i'm not exactly sure how you would infer the input size from a buffer directly (since an image has 3 dimensions)