Help wanted: Test Colab notebook for training TFLite models

Question

Help wanted: Test Colab notebook for training TFLite models

EdjeElectronics opened this issue 2 years ago · 24 comments

It's been a while since I've worked on this repository, but I'm diving back into it to make some improvements! Today I added a Google Colab notebook that allows you use Google's servers to train, convert, test, and export a TensorFlow Lite model. It makes training a custom TFLite detection model easy! It uses the TensorFlow Object Detection API, which provides high configurability and is best for training with large datasets.

I'm looking for help with testing the Colab notebook to confirm it works for all users. Have you been looking to train an SSD-MobileNet or EfficientDet model and and deploy it with TFLite? If so, can you try stepping through this notebook to see if it works for you and post in this thread on how it turned out? Let me know if you run in to any errors or have questions.

Here's a link to the Colab notebook: Train_TFLite2_Object_Detction_Model.ipynb

If you need a dataset to use, I uploaded my "bird, squirrel, raccoon" dataset of 900 images to Dropbox. I included a command to download this dataset into the Colab as one of the options in Step 3.

I'm planning to make a video that shows how to go through the notebook step-by-step, but it will be a few weeks until it's ready. For now, hopefully the instructions inside the Colab doc do a good enough job explaining.

Feedback I'm looking for:

Were you able to successfully train a ssd-mobilenet-v2 model, test it, and see detection results on the test images? (i.e. were you able to make it through Steps 1 - 7 without errors?)
If you ran into errors, what were they?
Was anything in the notebook confusing or unclear? Were there any points where you weren't sure what to do next?
Do you have any suggestions for improving the notebook?

Known issues:

The centernet-mobilenet model still doesn't work with TensorFlow Lite, I'm still trying to get that one figured out.
Accuracy drops significantly when quantizing the ssd-mobilenet-v2 model
efficentdet, centernet-mobilenet, and ssd-mobilenet-fpnlite models can't be quantized
In other words, quantization still isn't really working. But the unquantized TFLite models seem to work well!

Answer 1 · 2022-09-06T12:54:16.000Z

Hey @elvenkim1 , thanks for testing it! Did you change the labelmap a in the code block few lines above? It should have "bird, squirrel, raccoon" like this:

bird
squirrel
raccoon

Here's a screenshot of what it should look like.

Answer 2 · 2022-09-06T13:07:32.000Z

Hi EJ Yes I did. I managed to solve the code on top by manually keyed in the number 3 as num_classes, the one that says labelmap.pbtxt. but stuck at the 2nd code as shown in the screenshot. Regards Elven

…

On Tue, 6 Sept 2022, 20:54 Evan, ***@***.***> wrote: Hey @elvenkim1 <https://github.com/elvenkim1> , thanks for testing it! Did you change the labelmap a in the code block few lines above? It should have "bird, squirrel, raccoon" like this: bird squirrel raccoon Here's a screenshot of what it should look like. [image: image] <https://user-images.githubusercontent.com/26154245/188640249-d1e5a7e9-8d43-4018-8e63-2a3bd3cd4370.png> — Reply to this email directly, view it on GitHub <#135 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVGW7VIPI3MN4HTY4667PODV445IHANCNFSM6AAAAAAQDTLAOM> . You are receiving this because you were mentioned.Message ID: <EdjeElectronics/TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-Pi/issues/135/1238111504 @github.com>

Answer 3 · 2022-09-06T21:27:37.000Z

@elvenkim1 it looks like you deleted your previous comment which showed the error. Can you show me the error you're getting again?

Thanks!

Answer 4 · 2022-09-07T06:57:13.000Z

I am able to run now and completed all steps without errors.

I tried with my own dataset and bumped into this error. Please advise.

Answer 5 · 2022-09-07T07:19:02.000Z

also, there is missing on label on the image itself.

Answer 6 · 2022-09-07T08:46:48.000Z

the video is not showing the label as well. I tested this on Raspberry Pi

python TFLite_detection_webcam.py --imgdir=fine_tuned_model_lite --> should be python TFLite_detection_webcam.py --modeldir=fine_tuned_model_lite

Answer 7 · 2022-09-07T13:19:08.000Z

Thanks again for testing out the Colab! Good catch on changing --imgdir to --modeldir, I will make sure to fix that.

Sorry you aren't seeing any bounding boxes, it may be because the model isn't very accurate. Can you try setting the threshold level to 0.01 using the --thresh=0.01 argument as shown below? Do any boxes show up if you do that?

TFLite_detection_webcam.py --modeldir=fine_tuned_model_lite --thresh=0.01

Answer 8 · 2022-09-08T09:25:34.000Z

Hi EJ

yes, it is working now 👍! You are right to say the model is not accurate..as I only trained for 500 steps.

It should be: thresh --> threshold

TFLite_detection_webcam.py --modeldir=fine_tuned_model_lite --threshold=0.01

Now may I know if you know how to add the accuracy, precision etc graph? I understand we need to have another terminal running alongside with the training process but this is not possible with Colab.

Answer 9 · 2022-09-08T12:44:29.000Z

Awesome! Glad it's working. The TensorBoard window in Section 5 will show the model's training and validation loss as it trains. Unfortunately, it's tough to show a graph of precision/accuracy over time. I will add a section to calculate the model's mAP on the test images.

Answer 10 · 2022-09-08T23:04:00.000Z

Awesome, thanks!

…

On Thu, 8 Sept 2022, 20:44 Evan, ***@***.***> wrote: Awesome! Glad it's working. The TensorBoard window in Section 5 will show the model's training and validation loss as it trains. Unfortunately, it's tough to show a graph of precision/accuracy over time. I will add a section to calculate the model's mAP on the test images. — Reply to this email directly, view it on GitHub <#135 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVGW7VKUPKNESNNDMDIUGALV5HNTPANCNFSM6AAAAAAQDTLAOM> . You are receiving this because you were mentioned.Message ID: <EdjeElectronics/TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-Pi/issues/135/1240667953 @github.com>

Answer 11 · 2022-09-16T20:25:13.000Z

I went through the board and am now in the training stage up to which it worked beautifully. I have used your bird, squirrel, racoon dataset, as I happen to be interested in detecting the squirrels and birds in my garden :-D
A minor thing I have noticed:
"At this point, whether you used Option 1 or Option 2, you should be able to click the folder icon..." Should also include "Option 3"

I think it would be helpful to post some information on your reference dataset: it would be interesting to compare the obtained score with yours and maybe have some example video to try it out with in the end to have a confirmation that it worked. Maybe you could also provide the tflite file. (I don't know if its helpful for anybody, but as I am interested in exactly these objects at least I am very interested :-) )

In sum, this is an awesome project you have there, thank you very much

Answer 12 · 2022-09-27T15:43:48.000Z

At step # "Next, we'll split the images into train, validation, and test sets. Here's what each set is used for:"

The python script that is executed uses the wrong directory, making the total image count 0. The script looks in /content/images/all, but the path of the images are /content/images/all/images (based on following the steps in the Colab notebook).

Edit: This is resolved. Zip the individual files, not the folder. Sorry!

Answer 13 · 2022-09-27T16:18:41.000Z

On Step # 23 (? where are the step numbers anyway?)

"NotFoundError: /content/labelmap.pbtxt; No such file or directory"

This is because the file is named "labelmap.txt", not ".pbtxt"

This can be fixed in Step # 18
label_map_pbtxt_fname = '/content/labelmap.txt'

Edit: This was likely caused by an error in a previous step. Upon trying again, this step works

Answer 14 · 2022-09-27T16:22:16.000Z

After fixing the above, a new error is reported on Step # 23

ParseError: 1:1 : Message type "object_detection.protos.StringIntLabelMap" has no field named "class1".


During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)

[/usr/local/lib/python3.7/dist-packages/object_detection/utils/label_map_util.py](https://localhost:8080/#) in load_labelmap(path)
    171       text_format.Merge(label_map_string, label_map)
    172     except text_format.ParseError:
--> 173       label_map.ParseFromString(label_map_string)
    174   _validate_label_map(label_map)
    175   return label_map

TypeError: a bytes-like object is required, not 'str'

Edit: This was likely caused by an error in a previous step. Upon trying again, this step works

Answer 15 · 2022-09-27T17:04:08.000Z

At Step # 9 received this error:

Successfully converted xml to csv.
Successfully converted xml to csv.
Traceback (most recent call last):
  File "create_tfrecord.py", line 120, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "create_tfrecord.py", line 98, in main
    tf_example = create_tf_example(group, path)
  File "create_tfrecord.py", line 46, in create_tf_example
    encoded_jpg = fid.read()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 114, in read
    self._preread_check()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
    compat.path_to_str(self.__name), 1024 * 512)
tensorflow.python.framework.errors_impl.NotFoundError: /content/images/train/image-01.jpeg; No such file or directory
Traceback (most recent call last):
  File "create_tfrecord.py", line 120, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "create_tfrecord.py", line 98, in main
    tf_example = create_tf_example(group, path)
  File "create_tfrecord.py", line 46, in create_tf_example
    encoded_jpg = fid.read()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 114, in read
    self._preread_check()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
    compat.path_to_str(self.__name), 1024 * 512)
tensorflow.python.framework.errors_impl.NotFoundError: /content/images/validation/image-01.jpg; No such file or directory

There is no image-01.jpg in the uploaded images.zip folder. There is a image-001.jpg though, but that is located in /train

Edit: This has been resolved

Answer 16 · 2022-09-27T19:25:38.000Z

At Step # 9 received this error:
There is no image-01.jpg in the uploaded images.zip folder. There is a image-001.jpg though, but that is located in /train

Sorry, this issue is because the filenames inside the .xml from LabelImg are different than the actual filenames of the .jpg files.

Answer 17 · 2022-09-27T20:04:38.000Z

On Step # 44 (run training), receive this error:

Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main_tf2.py", line 31, in <module>
    from object_detection import model_lib_v2
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 31, in <module>
    from object_detection import model_lib
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib.py", line 35, in <module>
    from object_detection.builders import optimizer_builder
  File "/usr/local/lib/python3.7/dist-packages/object_detection/builders/optimizer_builder.py", line 25, in <module>
    from official.modeling.optimization import ema_optimizer
  File "/usr/local/lib/python3.7/dist-packages/official/modeling/optimization/__init__.py", line 23, in <module>
    from official.modeling.optimization.optimizer_factory import OptimizerFactory
  File "/usr/local/lib/python3.7/dist-packages/official/modeling/optimization/optimizer_factory.py", line 36, in <module>
    'adamw_experimental': tf.keras.optimizers.experimental.AdamW,
AttributeError: module 'tensorflow.keras.optimizers' has no attribute 'experimental'

Answer 18 · 2022-09-27T20:07:36.000Z

On Step # 43 (loading tensorboard), a 403 Error is displayed:

Answer 19 · 2022-09-27T23:31:27.000Z

I received the same error as @RobotGrrl

Answer 20 · 2022-09-27T23:49:25.000Z

@RobotGrrl I got past this error by editing /usr/local/lib/python3.7/dist-packages/official/modeling/optimization/optimizer_factory.py and removing line 36,

'adamw_experimental': tf.keras.optimizers.experimental.AdamW,

Answer 21 · 2022-10-03T14:43:28.000Z

@kleinpoe thanks a bunch for the feedback! I made the fix you recommended for the list of options. I'll think about including an example video for people to try the model with, too. By the way, my bird/squirrel/raccoon dataset isn't too great at actually detecting those creatures. Planning to replace it with a different model before releasing the video!

Thank you @RobotGrrl for working through it and showing me the issues you bumped into. I'll see if I can find a way to make it work even if images aren't zipped directly inside the images.zip folder and to make the labelmap creation process more clear. Congrats on all the cool work you are doing with the Atmosphinder, by the way!

Thanks @jugglingboss for identifying this error and finding the solution. The TF team released a new version of the models repository that added the AdamW thing, which only works with TF v2.10. I changed the Colab to modify the setup.py file so it downloads a compatible release of the models repository.

Answer 22 · 2022-10-11T07:47:13.000Z

Hi EJ I am able to run the Colab now. But when I tried with my own dataset, it says child index out of range. [image: image.png]

…

On Wed, 7 Sept 2022 at 05:27, Evan ***@***.***> wrote: @elvenkim1 <https://github.com/elvenkim1> it looks like you deleted your previous comment which showed the error. Can you show me the error you're getting again? Thanks! — Reply to this email directly, view it on GitHub <#135 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVGW7VIZI75GWVTYMW2DJ4DV46ZNJANCNFSM6AAAAAAQDTLAOM> . You are receiving this because you were mentioned.Message ID: <EdjeElectronics/TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-Pi/issues/135/1238667933 @github.com>

Answer 23 · 2022-10-17T09:49:32.000Z

Hello, I have trained myself to detect.tflite and labelmap.txt is copied to Raspberry Pie, no box appears, but I use Google's target detection example to run Raspberry Pie normally.
After I changed -- threshold to 0.01, many boxes appear
I trained 10000 steps, which I think is enough,
I tried to retrain many times, but the problem still exists
I hope I can pay attention to this problem. It has bothered me for several days
Thanks

Answer 24 · 2022-10-28T23:06:37.000Z

Thanks for the help and feedback everyone!