geaxgx/depthai_blazepose

[ImageManip(7)] [error] Invalid configuration or input image -skipping frame

Opened this issue · 21 comments

My camera has this problem after running for about 5 hours, and the program is directly stuck, and the program has not been changed. I suspect that the reason is that the temperature of my camera is too high, but this error is that when there is no one for a long time, someone suddenly starts to jam. Have you ever encountered this situation? I hope you can reply to me as soon as possible. Thank you.

I have never encountered this situation but I have never run the program for a very long time.
Have you encountered the problem several times ? If yes, have you observed the same behaviour (error appears as soon as someone gets in the front of the camera after a long period with nobody) ?
I can try to reproduce. What is your depthai version ? What is the command line used to run the program ?

Thank you for your timely reply. This phenomenon has indeed occurred many times. It really happened after no one entered the screen for a long time. I used depthai-2.17.0.0, and the instructions for the source program are Python 3 demo py -e --lm_ M Lite, but I modified the central management script file, and I deleted all the parts about XYZ, because my program runs on raspberry pie, in blazeposition edge There will be an error in line 371 of Py, so I deleted it. I added my modified central management script Wen, and the pipeline has not been changed,If you can help me eliminate the information, I would be very grateful.
template_manager_script.zip

Ok, I will let run on my side to try to reproduce. What OAK device are you using ?

I don't understand why you deleted the parts about XYZ. If you don't use '-xyz' option, these parts are automatically deleted. BTW if you want to see what is the script really executed, you can run with trace option : demo.py -e --lm_m lite -t. The script will be saved in tmp_code.py.

xyz_in_rasberrypi4b_thonny_ide
error_information
The following contents will be commented out when the source code is placed on the raspberry pie, which will cause the program to be stuck. This phenomenon will occur every time, provided that I have disabled XYZ. The picture is a phenomenon, and the other picture is the error message of the theme. Thank you.

@shamus333 I guess the python editor on the rpi is confused by the '"""'. Good to know.

Thank you for replying to me in your busy schedule yesterday. Return to the original demo program. When I run Python 3 demo py -e --lm_ M Lite, after the CV window appears, I cover the camera with my hand, and then quickly remove my hand. When my head appears in the picture quickly, such a problem will appear, but it is not absolute. Sometimes, it needs to appear in the picture quickly many times before this error occurs, but it occurs every time. I tried to modify the script file about the rotation matrix, but there was no substantive change. This is a pipeline communication bug,I can't troubleshoot the error now. Even if the trace option is turned on and the RR data is obtained, it is still not very helpful for the error. Thank you. The following is my error image and debug image. The code has not been change
error
debug
d

Thank you. If you have found a way to reproduce the error, that's very useful. I will have some time to test a bit later.
What is line "[warning] ERROR :-0.3662109375..." ? Is it some trace you have added ?

BTW, yesterday I let run the OAKD for more than 5 hours but wasn't' able to get the error.

If you are tracing the value of the rotated rectangle

rr = RotatedRect()
, I guess negative values for the rectangle center could cause the Invalid configuration error.

Thank you for your advice. I tried to send the configuration according to your suggestion. If the value of the check box is greater than 0, send_result (0) and continue will be sent. The error still appears, as shown in the picture. thank you

I was able to reproduce the problem twice. I still don't get what is the exact way to reproduce the problem for sure, but I confirm that it happens after a successful pose detection (successful here means the confidence level above threshold) and just after sending the ImageManip config to the ImageManip node.
I am also printing the config now to check if is has some invalid or absurd values that could cause the error. But it does not seem to be the case. In my previous message, I thought that negative values for the rotated rectangle center could be invalid, but it is not the case. I have tested on a very simple script, negative values work without problem.
In one of my failure, the rotated rectangle config was: 0.83251953125, 3.7395832538604736, 6.072499752044678, 10.795555114746094, -2.1773664951324463 (resp. rr.center.x, rr.center.y, rr.size.width, rr.size.height, rr.angle).
I then tried to use the same exact values in my simple script and it worked also without any error.
So I need to investigate more.

Thank you for your timely answer. I want to ask you for some knowledge that I don't understand. After the above error occurs, I want to locate the error location and let the program exit safely. But I can't find it now, because I think the pipeline has been generated at the beginning of initialization, and the data stream is sent in the pipeline at any time. I can't use try catch to locate it, When pycharm is used to forcibly stop the program, the program stops atres = marshal.loads(self.q_manager_out.get().getData()),It should be that the data flow of the pipeline has been broken. Please tell me where I can use the code to close my pipeline. Thank you.

Actually, from the point of view of the 2 scripts, there is no error that can be catch with "try catch".
The script running on the device is waiting on this line:

lm_result = node.io['from_lm_nn'].get()

Because the ImageNode skips the erroneous frame, the wait is forever.
And the script running on the host is waiting for the message from the device's script on this line:
res = marshal.loads(self.q_manager_out.get().getData())

Now that we can reproduce the error at will (see luxonis/depthai-python#657), I hope we will soon find a solution.
If you are in a hurry, one temporary solution would be to replace in the device's script the get() of the line 144 by a loop that does non-blocking tryGet() and a send_result (0) if nothing is received during let's says 1 second.

@shamus333
Luxonis team has proposed the following fix: luxonis/depthai-python#657 (comment)
In case you want to try it, you need to install the version of depthai with the command given in the linked comment and then apply the following modification in BlazeposeDepthaiEdge.py:

diff --git a/BlazeposeDepthaiEdge.py b/BlazeposeDepthaiEdge.py
index 04569e8..9f7e061 100644
--- a/BlazeposeDepthaiEdge.py
+++ b/BlazeposeDepthaiEdge.py
@@ -203,6 +203,11 @@ class BlazeposeDepthai:
         self.pd_input_length = 224
         self.lm_input_length = 256
 
+        # For bug https://github.com/luxonis/depthai-python/issues/657#issuecomment-1247029073
+        # Temporary fix: enlarge cache available to the ImageManip node
+        # pip install --extra-index-url https://artifacts.luxonis.com/artifactory/luxonis-python-snapshot-local/ depthai==2.17.3.1.dev0+b29822e30d782deb9ae8100817b34aea67fb1257
+        pipeline.setImageManipCmxSizeAdjust(160*1024)
+
         # ColorCamera
         print("Creating Color Camera...")
         cam = pipeline.create(dai.node.ColorCamera) 

@geaxgx Thank you very much for replying to my question at the first moment. I am sorry for replying to you only now. I have also read the answer from the official team in detail. I have already run the program and the problem no longer occurs, but it's only a temporary fix at the moment, according to the official team's reply. I hope this problem can be fixed permanently. Thank you from the bottom of my heart for helping me solve my problem in your busy schedule.

Hello, I run the program according to the solution you gave today, and made modifications, but the error can still be reproduced, so I observed your communication with the official, the current version of depthai has been updated to 2.18.0, I Tried this version of depthai again and still reproduced the error very easily. But in the prompt, a new prompt has been added. I hope you can take time out of your busy schedule to answer my question. I will be very grateful, thank you.
Screenshot from 2022-10-26 19-18-11

I have just did the test and can confirm that I still get the error with depthai 2.18.0 too. I am a bit surprised as 2.18.0 was expected to integrate modifications that seemed to solve the problem. As you have noticed, the error message is different. And this MRE (luxonis/depthai-python#657 (comment)) which was failing before, is working now. So there is definitively some improvement. My understanding is that now the memory used by the image manip node is dynamically allocated. Could it be that because depthai_blazepose is using a lot of nodes (in particular heavy neural network nodes), the available memory for the image manip node is too small in certain conditions ? And because I don't have control on the other nodes used by the user application, there will be no guarantee that there will be enough memory for the image manip node.
So maybe a better solution to avoid the problem is for me to modify my code so that I don't call the image manip node when I estimate that the image manip config that is sent to the node could cause the error.
To be more precise, I think we both agree that the error occurs when the face is very close to the camera. In that case, the face is filling a big part of the image and the size of the bounding box given by the detection network is huge. I could check this size and compare it to a threshold. If the size is above the threshold (meaning the face is too close to the camera), I could simply not used used the image manip node and wait for the next frame to process. From the application point of view, it would be considered exactly as if the body has not been detected.
Thoughts ? What would be the impact for your application ? Would it be acceptable that for frames where the face is too close to the camera, we consider that the body is not detected ?

I am very happy to have the same idea as you, and I tried to modify the script file to eliminate the above problems on the same day, and obtained a certain threshold through many experiments, such as the following pictures as a reference, I hope your code can be updated and modified externally The threshold method temporarily lets the code run normally first, at least without crashing.
if rr.size.width > 4 or rr.size.height > 7: send_result(0) send_new_frame_to_branch = 1 ${_TRACE}("Avoid troubleshooting") continue
script_1

I have tested the threshold value, and now I provides it to you @geaxgx

  1. When an error occurs (big face), rr.size.width is generally 8, and rr.size.height is generally 15.
  2. The rr.size.width is generally around 0-1 and the rr.size.height is generally around 0-1 when the whole body skeleton is used.
  3. For the half body skeleton, rr.size.width is generally around 3.4, and rr.size.height is generally around 4.5.6.

When the human body gets closer to the camera, rr.size.width and rr.size the faster the height increases, and when it reaches a certain dangerous value, an error will be reported.

At present, I am constantly testing the camera every day to see if there is a problem again. If there is a problem, I will contact you and work with you to solve it. You and I have been replying frequently. I hope you are happy every day.

Thank you for your contribution @shamus333 ! I appreciate a lot !
I will test on my side too.
Two quick remarks come to my mind:

  • given the way rr.size.height is calculated from rr.size.width, comparing rr.size.width with a threshold is probably enough;
  • rr.size.width is a normalized value, so there is a chance that the threshold you have found for rr.size.width (below 8) works well for the image size you are working with (1152x648 if you are using the default image size). If you work with a bigger image size (can be set with --internal_frame_height argument), the image manip node will need more memory to do the same job. In that case, possibly the error will happen. I need to check that :-)

I did some tests with a bigger image size (1792x1008) and I confirm my second remark above: I get error with smaller rr.size.width values (around 4.5).
Sometimes the error message is the same as above: [error] Not possible to create warp params. Error: WARP_SWCH_ERR_CACHE_TO_SMALL
But sometimes the error is different: [critical] Fatal error. Please report to developers. Log: 'ResourceLocker' '358'

I am going to reopen the issue with luxonis.