HumanRecognitionBehaviorAnalysis

An integrated computer vision system for human recognition and behavior analysis from RGB Camera. Project realized within the master studies internship.

Components

This system was created using the state-of-the-art computer vision components based on the following repositories:

object detection - PyTorch YOLOv3,
face detection - Dlib's CNN-based face detection model,
face recognition - Dlib's face recognition model,
object tracking - Dlib's correlation tracker,
facial emotion recognition - Facial-Expression-Recognition.Pytorch,
human pose estimation - lightweight-human-pose-estimation.pytorch,
skeleton-based action recognition - 2s-AGCN (MS-AAGCN).

The skeleton-based action recognition component was modified and expanded into practically usable component. The whole pipeline extracting skeletons human pose skeletons from RGB video was created:

The skeleton-based action recognition component was trained on the custom 2D skeleton dataset. Using the same approach with Lightweight OpenPose as for the action recognition pipeline, 2D skeletons were extracted from RGB videos coming from the original NTU-RGB+D dataset. First 60 classes (A001-A060) were taken into account.

All the components were integrated into one system as presented below:

For more details about the components and the system functionality, please check internship presentation and abstract.

Installation

An RGB camera is required.
The following packages need to be installed, with the newest versions recommended:

pytorch (torch)
torchvision
h5py
sklearn
pycocotools
opencv-python
numpy
matplotlib
terminaltables
pillow
tqdm
dlib (with GPU support)
argparse
PIL
imutils

As described in PyTorch YOLOv3 installation guidelines,

download pretrained weights :

$ cd pytorch_yolo_adapted/weights/
$ bash download_weights.sh

and download COCO:

$ cd pytorch_yolo_adapted/data/
$ bash get_coco_dataset.sh

Following the guidlines of Facial-Expression-Recognition.Pytorch, create a new folder: facial_recognition_pytorch_adapted/FER2013_VGG19/, download the pre-trained model from link1 or link2 (key: g2d3)) and place it in the FER2013_VGG19/ folder.
Further, download Dlib's human face detector model from here as well as Dlib's shape predictor from here, unpack both files and place them inside facial_recognition_pytorch_adapted/

As described in lightweight-human-pose-estimation.pytorch installation guidelines, download the pre-trained model from here and place it in lightweight_openpose_adapted/models/.

Finally, download: Dlib's human face detector model from here and Dlib's shape predictor as well as Dlib's recognition resnet model from here, unpack all files and place them inside data/face_recognition/

Generating more descriptors for facial recognition

In order to generate more facial descriptors,

go to data/face_recognition/ and type the following:

$ face_descriptor_generator.py shape_predictor_5_face_landmarks.dat dlib_face_recognition_resnet_model_v1.dat /path/to/the/input/image "name of the person"

Move the generated descriptor to face_descriptors/ . Note that the input image should contain photo of only one person. In case of multiple faces detected, only the first one will be processed and saved to the file.

Running the program

Launch command prompt or terminal and go to the project main folder (HumanRecognitionBehaviorAnalysis/).

Type in the console:

python main.py

The camera input window will appear and the processing will start.
In order to stop the processing, set the focus on the output video window (e.g. by clicking on it) and press 'Q' from the keyboard.

The generated output files will include:

the processed video sequence (no sound), saved in the format: "output_processed_video_[year]_[month]_[day]_[hour]_[minute]_[second].avi",
the activity log text file (in case of successful and sufficient registration of the frames for action recognition) with potentially identified person, the performed action and corresponding facial emotion. The activity log file will be saved in the following format: "activity_logs_[year]_[month]_[day]_[hour]_[minute]_[second].txt".

tstanczyk95/HumanRecognitionBehaviorAnalysis