Pedestrian detection from real-time video feeds
Opened this issue · 23 comments
Goal
Detect pedestrians from real-time video feeds. You can use pre-trained object detectors for this task. This would serve as a reasonable baseline. Later, if needed, you can train an object detector solely to detect pedestrians.
Considerations
- As the detector would be running on real-time video feeds using TensorFlow Lite with Python on a Raspberry Pi 4 Model B, you should focus on the latency aspects of it without compensating for its accuracy. An ideal model would be fast and at the same time won't produce too many false positives. Also, consider if the detector should run on every single frame of a video feed, or could we minimize it?
- As you would try out different models for the purpose please do a thorough study of their latency along with their detection performance on videos. This information will make it easier for us to decide which model we might want to go with.
Deliverables
- A Colab Notebook to demonstrate the idea.
- A Python script (you can modularize code with multiple scripts too) for the end-to-end execution i.e. this script will take real-time video feeds as its input from a Pi Camera, detect pedestrians, and will display the results on the screen.
Tools
You are free to use open-source pre-trained models. If you use someone else's please attribute it. If your code is plagiarized then you will be suspended (applicable only if you are a WoC participant).
For now I have used (some) pretrained models (SSD on MobileNetV2 nad its variants) from Hub, which gives both false positives and negatives depending on the image resolution and angle.
I have to try working with raw boxes output without (already applied NMS) and check on varying NMS and min score threshold. Would be nice if I could get a "test image" so that I can set a particular value to the thresholds.
You can refer to this sample video - https://github.com/DeepFusionAI/social-distance-detector/blob/master/pedestrians.mp4.
The file seems to be empty.
@rishiraj could you look into this? The file might have gotten added to git-lfs
and isn't reflecting here properly?
@sayakpaul @around-star fixed it!
As I have gone through different resources, I found that SSD model can be converted easily with TensorFlow Lite and on-device inference is only optimized with SSD models. The usual pretrained model with number of classes can be retrained with one output class that is the pedestrian.
@piyush-cosmo even I think this would be the right choice to go with. We can use the SSD MobileNet model with COCO labels. I think this will be beneficial even when optimizing for Edge TPU ML accelerator in future.
@sayakpaul need your opinion and advice.
You can consider MobileDet as well (https://sayak.dev/mobiledet-optimization/). Faster than SSD.
@sayakpaul bhaiya @rishiraj bhaiya, I checked the latency and mAP of MobileDet and it is faster and more accurate. I checked it from this source Detection Zoo
Yeah. So, we might want to compare the performance between the different variants of MobileDet and SSD MobileNet.
I found that MobileNet gives high speed but compromise accuracy. However using the same with SSD improves performance and accuracy. From TensorFlow 2 Detection Model Zoo I found 4 variants:
SSD MobileNet v2 320x320
SSD MobileNet V1 FPN 640x640
SSD MobileNet V2 FPNLite 320x320
SSD MobileNet V2 FPNLite 640x640
@rishiraj @sayakpaul which one to choose?
This decision should come from empirical experiments. I would run all of these models on a sample video and will record the FPS they are providing. I would also visually investigate their results.
On the other hand, it's crucial to denote their mAP and FPS as noted in the model repository and their respective papers.
If you see the earlier comments, you would also notice a small discussion around MobileDet. So, please consider that.
@sayakpaul bhaiya @rishiraj bhaiya , here is my colab , the conversion is done and can also detect object in images, but I am facing issues while detecting on videos, there is a minor issue I guess but I am unble to figure it out, will you please help me in this.
Here is the video link in my drive pedestrian.mp4
@piyush-cosmo I think you are running into errors while serializing the output frames to a file. It might be happening because of the codec you are passing to the writer class. You can refer to this notebook to see how I am doing it.
I know the notebook's not complete yet but here are some early suggestions:
- CITATION
- Provide an easy interface to choose between different MobileDet models to the user and observe performance.
- Include FPS information (it's available in the notebook I linked above).
Here is my colab showing all my work.
The pretrained model used here is SSD_mobileDet_cpu_coco. I have created my own representative dataset by subsampling 200 images of person from COCO dataset. The FPS is 0.73 on video given.
Issue : The output video background is black only showing detection boxes.
I think this issue can also be helpful as the respective person will not be detected ensuring privacy of citizen as I also mentioned this in my proposal.
This is drive link for my train_sampe_dataset_only_person
This is drive link for my COCO label
This is drive link for pedestrians.mp4
@piyush-cosmo amazing work here!
A couple of pointers -
- Since the model you are using i.e.
SSD_mobileDet_cpu_coco
was trained on the entire COCO dataset it won't make sense to just use images of persons as the representative dataset. Had it been trained with only those images it would have been the way to go. - I like the idea of anonymizing the background to all black. Let's give the user an option to choose if they want to use it or not. You could consider using the Colab drop down for this and then craft your logic accordingly.
- Make the Colab Notebook end-to-end runnable. You are using Drive files here that might be too specific for your purpose. There are a couple of options here that you could consider -
- Host your files on your Drive, share them publicly (with read access). Retrieve them with
gdown
. - Host your files under Releases of any public GitHub repository and retrieve them with
curl
orwget
.
- Host your files on your Drive, share them publicly (with read access). Retrieve them with
Overall very impressive work.
@sayakpaul bhaiya, thanks a lot for all your reviews. I will work on all the issues you have pointed out.
@piyush-cosmo you've done a really good job. However in the prediction I see some persistent large overlapping bounding boxes being falsely predicted as person. I fear this can cause trouble while calculating distance between actual persons and giving social distancing violation alerts. What might be the possible reasons for this and can you think of ways to overcome it?
@sayakpaul bhaiya @rishiraj bhaiya, here is my colab showing my updated work.
I like the idea of anonymizing the background to all black. Let's give the user an option to choose if they want to use it or not. You could consider using the Colab drop down for this and then craft your logic accordingly.
I have given user an option to choose if they want black background or not. Now, I have both code for black background and usual detection in original video.
Make the Colab Notebook end-to-end runnable. You are using Drive files here that might be too specific for your purpose. There are a couple of options here that you could consider -
Host your files on your Drive, share them publicly (with read access). Retrieve them with
gdown
.
Host your files under Releases of any public GitHub repository and retrieve them withcurl
orwget
.
I have made my Colab Notebook end-to-end runnable. I have hosted my files under Releases of my GitHub repository and retrieve them with "wget".
However in the prediction I see some persistent large overlapping bounding boxes being falsely predicted as person. I fear this can cause trouble while calculating distance between actual persons and giving social distancing violation alerts. What might be the possible reasons for this and can you think of ways to overcome it?
I have solved this issue by taking area of bounding boxes and running simple if-else to ignore the large overlapping boxes.
Additionally, I have added three variants of SSD_mobileDet model including SSD_mobileDet_cpu_coco_int8. The variants added are:
- SSD_mobileDet_cpu_coco_fp16
- SSD_mobileDet_cpu_coco_dr
From all these three variants, I find SSD_mobileDet_cpu_coco_fp16 to be the fastest as the best FPS is 9.98 and elapsed time is 53.38 seconds only. Previously, working with SSD_mobileDet_cpu_coco_int8 was slow as the FPS was 0.75 and elapsed time was 705.32 seconds. It is the same with SSD_mobileDet_cpu_coco_dr being slow.
Now, should I go on with:
- Training model from scratch with only person dataset from COCO dataset or
- Distance calculation between pedestrians.
Stellar work @piyush-cosmo! You have addressed the comments very strongly. I have some very minor formatting related feedback and I decided to create a Colab Gist so that you can compare. You can find it here. Note that these care very minor, you can ignore them.
As for the next steps, I think you could consider collaborating with @Sudarshana2000 and @SubhasmitaSw and help them incorporate MobileDet into the entire pipeline.
@rishiraj WDYT?
Lovely work @piyush-cosmo I have executed the code and it's giving perfectly fine output. You've successfully completed the issue assigned to you.
Yes @sayakpaul I think he can now collaborate with @Sudarshana2000 & @SubhasmitaSw on detecting the distance. Request both of you to kindly share your progress with @piyush-cosmo .
@sayakpaul bhaiya @rishiraj bhaiya, thanks a lot for all your reviews and feedbacks. I have compared the Colab Gist you have created and accepted those feedbacks. Next, I will start working with @Sudarshana2000 & @SubhasmitaSw on detecting the distance.