YOLO for ASL

This page presents a demo video of my You Only Look Once model for American Sign Language video classification.

This demo video proposal describes the genesis and vision for this project.

This brief paper accompanies the demovideo. The video sound has significant inherent noise (hiss) - apologies. You can also follow along with the transcript that's set just below the video along with the slides.

DEPP.MSDS.462.55.Final.Project.YOLO.for.ASL_default.mp4

The following transcripts/slides do not show the running shell scripts and Colab notebooks or the YOLO in action that are demo'd in the video

Morning. Thanks for watching.

MSDS 462-55

Final Project Demo Video = YOLO for ASL

Steve Depp

WOLVES OF ISLE ROYALE

The motivation for this project is instant recognition of American Sign Language symbols by an edge device trained in the cloud, leading to a suite of products for visually and hearing impaired.

image

Final project DV - YOLO for ASL

Motivation:

  • instant recognition of American Sign Language symbols by an edge device
  • trained in the cloud of course
    —> a suite of products for visually and hearing impaired

Ingredients for training in the cloud:

  • browser for COLAB
  • Google Drive
  • annotated data = 48 hours
  • modified YOLO configuration
    • 200 samples per 27 classes = 5,400 samples
    • 1000 iterations per sample = 54,000 iterations
    • = 60 hours iterating over 106 layer model
      —> 95% mAP and 90% IOU

Ingredients for testing at the edge:

  • Nvidia Jetson Nano et al
  • camera
  • SD card loaded with Jetpack 4.2.1
  • remote desktop to Nano or monitor/keyboard/mouse hooked up

I had a little help along the way, ...

YOLO for ASL

References:

  1. Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934.
  2. Lin, J., Gan, C., & Han, S. (2019). TSM: Temporal Shift Module for Efficient Video Understanding. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 7082-7092.
  3. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
  4. Visée, R. J., Likitlersuang, J., & Zariffa, J. (2020). An effective and efficient method for detecting hands in egocentric videos for rehabilitation applications. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(3), 748-755.

Helpful links:

... and the tiny steps included first spending 200 bucks are parts, ...

YOLO for ASL

Intermediate steps: hardware = $ 201.18

brain

power

communication

vision

antenna

memory

... then deploying some applications, ...

YOLO for ASL

OS

OS remote install / set up

... then four steps to customize the darknet framework to suit 27 classes and 5,400 images.

YOLO for ASL

Custom objects - amending config files = 4 steps

27 classes and 5400 samples require these 4 darknet framework modifications:

1  Create file yolo-obj cfg with the same content as in yolov4-custom cfg (or copy yolov4-custom cfg to yolo-

set network size width-416 heicht-416 or any value multiple of 32

change line classes=80 to your number of objects in each of 3  yolo  -layers

change   filters=255 to filters=(classes + 5)x3 in the 3  convolutional  before each  yolo  layer, keep in mind

2  Create file obj names in the directory builddarknetx64data with objects names - each in new line

3  Create file obj data in the directory builddarknetx64data , containing (where classes = number of objects)

4  your objects in the directory builddarknetx64dataobj Put image-files ( jpg) of

22,000 clicks and 5,400 mouse drags later I had annotated those images, ...

YOLO for ASL

Custom objects - labels and bounding boxes

5400 images from A to Z + space —> bounding boxes + labels
labellmg UsersstevedeppDocumentsPersonalMSDS462week 10aslallvalidA1042 jpg

(This video shows the process of annotating 5,400 images with bounding boxes and labels using labelimg.)

FinshingAs_default.mp4

... so that we could have 4 steps to train in the cloud.
This is COLAB actually running on the Nano.
The first step is obtaining the darknet file and then unzipping it.
Next, we need to check that the CUDA version is version 10.
Then, we delete the old CUDA version which is actually newer, and install version 10.
When you do that, you need to answer yes, … right here.
Then, you confirm that the CUDA version is version 10.
You compile the darknet function.
Then, you train YOLO by 2 different methods:
Either you can train from scratch, or you can train using saved weights.
While you are training, download weights every 1,000 iterations.
You can also observe various error measures and you can click on PNG files to see the learning plots every 1000 iterations, or every 100 iterations.

image

YOLO for ASL

Steps for training custom YOLO

use this Colab notebook or this one from my google drive:

step 1: obtain the darknet zip from Steve’s drive

https://drive.google.com/file/d/13k7uWAEFmvjKV-0nXGuc4gFv-knwgInv/view?usp=sharing

step 2: ensure running CUDA 10

  • 2a: check CUDA version
  • 2b: delete current CUDA version
  • 2c: install CUDA v10 (answer Y when asked)
  • 2d: confirm CUDA version = 10

step 3: compile darknet function

step 4: train YOLO ASL by 2 methods

  • 4a. train from scratch with cfg/yolov4.conv.137 
  • 4b. repeat steps 1-3 and continue training with cfg/yolo-obj_20000.weights or latests weight set available

monitor:

  • download weights from darknet/backup folder every 1000 iterations
  • observe mAP, IOU, GIUO, avg loss per bounding box per iteration
  • double click darknet/chart.png and darknet/chart_yolo-obj.png learning plots every 100 iterations

The four steps for testing at the edge are:
downloading and unzipping,
assigning environment variables,
building and making the model,
and running it.
You can either run it on the video as we did earlier.
Or, you can run it
... live.

image

YOLO for ASL

Steps to test/demo custom YOLO for 27 ASL objects on Nvidia Jetson Nano

  1. download and unzipping Steve’s darknet.zip:

    https://drive.google.com/file/d/13k7uWAEFmvjKV-0nXGuc4gFv-knwgInv/view?usp=sharing

  2. assign environment variables; Jetpack 4.2.1 selected for CUDA 10; so, please specify that here:

     export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}  
    
     export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}   
    
  3. build:

     make   
    
  4. run it

  • run it on a video; escape key to exit this:

    ./darknet detector demo build/darknet/x64/data/obj.data cfg/yolo-obj.cfg yolo-obj_16000.weights data/ASLAZ.mov -i 0 -thresh   0.25   
    
  • run it live; escape key to exit this:

    ./darknet detector demo build/darknet/x64/data/obj.data cfg/yolo-obj.cfg yolo-obj_16000.weights "nvarguscamerasrc auto-exposure=1 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12, framerate=(fraction)60/1 ! nvvidconv flip-method=2 ! video/x-raw, width=(int)1280, height=(int)720, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink -e"
    

While this is getting ready to run,
I’ll just mention that there are a few future steps that I would consider:
better data set,
better tuned model,
possibly a different application: ...

YOLO for ASL

Future / next steps:

This is a book, a thesis, by a fellow by the name of David Mech, about wolves.

YOLO for wildlife classification

a different application:
The Wolves of Isle Royale, L. David Mech, U.S. National Park Service, Washington, DC, 1966.THE WOLVES

Possibly one thing we could do is replace
this low fuel plane
and cold wintry days hovering over wolves
with drones and a model...

YOLO for better imaging

the 1966 version of today's drones and wildlife classification
p m , and the latest,

... like the one deployed here.
It’s almost as easy as ABC.
Thank you for watching.

image