Project submission for AWS DeepLens Challenge
Note: I am currently in the process of upgrading this project to use an optimized MXNet model instead of a TensorFlow SSD model. 'Watch' this repo to keep up with any updates to the code.
For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.
- Determine when a page with text is in the camera frame
- Clean up the image using OpenCV
- Perform OCR (Optical Character Recognition)
- Transform text into audio using AWS Polly
- Play back the audio through speakers plugged into DeepLens
I originially used Tensorflow to create an object detection model. At the time of this writing, the onboard Intel Model Optimization library does not work for TensorFlow. Once it is fixed I will be able to optimize this model to run on the GPU on the DeepLens device.
My dataset was made from hundreds of photos of my kids' books as well as a number of library books taken in various lighting conditions, orientations, and distances. I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.
I was finally able to figure out how to train my model using MXNet and I will be updating this repo in the coming days to reflect those changes.
This project is built using GreenGrass, Python 3.6, MXNet, OpenCV, Tesseract, and AWS Polly.
To run this project on the deeplens, you will need to run the following commands.
sudo pip3 install --upgrade mxnet
sudo apt-get install ffmpeg
sudo apt-get install tesseract
In order to get sound to play on the DeepLens, you will need to grant GreenGrass permission to use the Audio Card.
Green Grass requires you to explicitly authorize all the hardware that your code has access to. One way you can configure this through the Group Resources section in the AWS IOT console. Once configured, you deploy these settings to the DeepLens which results in a JSON file getting deployed greengrass directory on the to the device.
To enable Audio playback through your Lambda, you need to add two resources. The sound card on the DeepLens is located at the path “/dev/snd/”. You need to add both “/dev/snd/pcmC0D0p” and “/dev/snd/controlC0” in order to play sound.
In order to get the Text Area cleaned up to perform OCR, it needs to go through a number of filters. This graphic shows the steps that ReadToMe goes through with each image before trying to turn the image into text.