This page presents a demo video of my You Only Look Once model for American Sign Language video classification.
This demo video proposal describes the genesis and vision for this project.
This brief paper accompanies the demovideo. The video sound has significant inherent noise (hiss) - apologies. You can also follow along with the transcript that's set just below the video along with the slides.
DEPP.MSDS.462.55.Final.Project.YOLO.for.ASL_default.mp4
The following transcripts/slides do not show the running shell scripts and Colab notebooks or the YOLO in action that are demo'd in the video
Morning. Thanks for watching.
Steve Depp
The motivation for this project is instant recognition of American Sign Language symbols by an edge device trained in the cloud, leading to a suite of products for visually and hearing impaired.
Motivation:
- instant recognition of American Sign Language symbols by an edge device
- trained in the cloud of course
—> a suite of products for visually and hearing impaired
Ingredients for training in the cloud:
- browser for COLAB
- Google Drive
- annotated data = 48 hours
- modified YOLO configuration
- 200 samples per 27 classes = 5,400 samples
- 1000 iterations per sample = 54,000 iterations
- = 60 hours iterating over 106 layer model
—> 95% mAP and 90% IOU
Ingredients for testing at the edge:
- Nvidia Jetson Nano et al
- camera
- SD card loaded with Jetpack 4.2.1
- remote desktop to Nano or monitor/keyboard/mouse hooked up
I had a little help along the way, ...
References:
- Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934.
- Lin, J., Gan, C., & Han, S. (2019). TSM: Temporal Shift Module for Efficient Video Understanding. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 7082-7092.
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
- Visée, R. J., Likitlersuang, J., & Zariffa, J. (2020). An effective and efficient method for detecting hands in egocentric videos for rehabilitation applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(3), 748-755.
Helpful links:
- Sergio Canu: YOLO V3 – Install and run Yolo on Nvidia Jetson Nano (with GPU)
- Jetson Hacks: Jetson Nano + Raspberry Pi Camera + Jetson Nano – Use More Power!
- JP Redmon
- AlexeyAB
... and the tiny steps included first spending 200 bucks are parts, ...
Intermediate steps: hardware = $ 201.18
brain
- $94.99
- NVIDIA Jetson Nano Developer Kit (945-13450-0000-100)
- https://www.amazon.com/gp/product/B084DSDDLT/ref=ppx_yo_dt_b_asin_title_o05_s02?ie=UTF8&psc=1
power
- $14.95
- Adafruit 5V 4A (4000mA) switching power supply
- https://www.adafruit.com/product/1466
communication
- $24.39
- Intel Dual Band Wireless-Ac 8265 w/Bluetooth 8265.NGWMG
- https://amzn.to/2UcHszJ
vision
- $27
- nVidia Jetson Camera
- https://store.donkeycar.com/products/nvidia-jetson-camera-for-donkey
antenna
- $4.86
- Molex Antenna Wi-Fi 3.3dBi Gain 2483.5MHz/5850MHz Film
- https://www.arrow.com/en/products/2042811100/molex
memory
- $34.99
- SanDisk Extreme Plus microSDXC UHS-I Card with Adapter, 128GB, SDSQXBZ-128G-ANCMA
- https://www.amazon.com/gp/product/B07HMJV355/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1
... then deploying some applications, ...
OS
- Jetson 4.2.1 flash on SD card
OS remote install / set up
- Etcher
- brew install screen
- nmap to check available network locations
- VNC connection to Jetson
- app = RealVNC
... then four steps to customize the darknet framework to suit 27 classes and 5,400 images.
Custom objects - amending config files = 4 steps
27 classes and 5400 samples require these 4 darknet framework modifications:
22,000 clicks and 5,400 mouse drags later I had annotated those images, ...
Custom objects - labels and bounding boxes
- compile labelimg
5400 images from A to Z + space —> bounding boxes + labels

(This video shows the process of annotating 5,400 images with bounding boxes and labels using labelimg.)
FinshingAs_default.mp4
... so that we could have 4 steps to train in the cloud.
This is COLAB actually running on the Nano.
The first step is obtaining the darknet file and then unzipping it.
Next, we need to check that the CUDA version is version 10.
Then, we delete the old CUDA version which is actually newer, and install version 10.
When you do that, you need to answer yes, … right here.
Then, you confirm that the CUDA version is version 10.
You compile the darknet function.
Then, you train YOLO by 2 different methods:
Either you can train from scratch, or you can train using saved weights.
While you are training, download weights every 1,000 iterations.
You can also observe various error measures and you can click on PNG files to see the learning plots every 1000 iterations, or every 100 iterations.
Steps for training custom YOLO
use this Colab notebook or this one from my google drive:
step 1: obtain the darknet zip from Steve’s drive
https://drive.google.com/file/d/13k7uWAEFmvjKV-0nXGuc4gFv-knwgInv/view?usp=sharing
step 2: ensure running CUDA 10
- 2a: check CUDA version
- 2b: delete current CUDA version
- 2c: install CUDA v10 (answer Y when asked)
- 2d: confirm CUDA version = 10
step 3: compile darknet function
step 4: train YOLO ASL by 2 methods
- 4a. train from scratch with cfg/yolov4.conv.137
- 4b. repeat steps 1-3 and continue training with cfg/yolo-obj_20000.weights or latests weight set available
monitor:
- download weights from darknet/backup folder every 1000 iterations
- observe mAP, IOU, GIUO, avg loss per bounding box per iteration
- double click darknet/chart.png and darknet/chart_yolo-obj.png learning plots every 100 iterations
The four steps for testing at the edge are:
downloading and unzipping,
assigning environment variables,
building and making the model,
and running it.
You can either run it on the video as we did earlier.
Or, you can run it
... live.
Steps to test/demo custom YOLO for 27 ASL objects on Nvidia Jetson Nano
-
download and unzipping Steve’s darknet.zip:
https://drive.google.com/file/d/13k7uWAEFmvjKV-0nXGuc4gFv-knwgInv/view?usp=sharing
-
assign environment variables; Jetpack 4.2.1 selected for CUDA 10; so, please specify that here:
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
-
build:
make
-
run it
-
run it on a video; escape key to exit this:
./darknet detector demo build/darknet/x64/data/obj.data cfg/yolo-obj.cfg yolo-obj_16000.weights data/ASLAZ.mov -i 0 -thresh 0.25
-
run it live; escape key to exit this:
./darknet detector demo build/darknet/x64/data/obj.data cfg/yolo-obj.cfg yolo-obj_16000.weights "nvarguscamerasrc auto-exposure=1 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12, framerate=(fraction)60/1 ! nvvidconv flip-method=2 ! video/x-raw, width=(int)1280, height=(int)720, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink -e"
While this is getting ready to run,
I’ll just mention that there are a few future steps that I would consider:
better data set,
better tuned model,
possibly a different application: ...
Future / next steps:
-
A better data set
- An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications
-
Video recognition of arms
- Temporal Shift Module for Efficient Video Understanding
-
A better tuned model
- YOLOv4: Optimal Speed and Accuracy of Object Detection
-
A different application
- Extreme inbreeding likely spells doom for Isle Royale wolves
This is a book, a thesis, by a fellow by the name of David Mech, about wolves.
a different application:
The Wolves of Isle Royale, L. David Mech, U.S. National Park Service, Washington, DC, 1966.

Possibly one thing we could do is replace
this low fuel plane
and cold wintry days hovering over wolves
with drones and a model...
the 1966 version of today's drones and wildlife classification

... like the one deployed here.
It’s almost as easy as ABC.
Thank you for watching.