/PedRec

Primary LanguageJupyter NotebookMIT LicenseMIT

PedRecNet

This repository contains the code for the PedRecNet (Paper: https://arxiv.org/pdf/2204.11548.pdf) as well as EHPI3D. It is the successor of our EHPI2D work (https://github.com/noboevbo/ehpi_action_recognition). The PedRecNet is a multi-purpose network that provides the following functions:

  • Human BB Detection (via YoloV4).
  • Human Tracking
  • 2D Human Pose Estimation
  • 3D Human Pose Estimation
  • Human Body Orientation (currently only Phi) Estimation
  • Human Head Orientation (currently only Phi) Estimation
  • "Pedestrian recognizes the camera" estimation
  • Human Action Recognition (via EHPI3D)

Note: This work is currently unpublished. It is part of my PhD dissertation and we are currently in the process to prepare (a? maybe more) paper. Note also, that, for now, I am no longer active in research, thus this code is provided as is.

PedRecNet: Demo01 - Pedestrian crossing the street + Hitchhike

PedRecNet Demo 02: Multiple Pedestrians

Citation

Please cite the following paper if this code is helpful in your research.

D. Burgermeister and C. Curio, “PedRecNet: Multi-task deep neural network for full 3D human pose and orientation estimation,” in 2022 IEEE Intelligent 
Vehicles Symposium (IV), 2022.

Installation

Requirements

  • Python 3.9 (venv suggested)
  • working CUDA / CUDNN

Installation steps

  • Clone this repository
  • cd PedRec
  • pip install -r requirements.txt
  • Download the pretrained models if you want to run the PedRecNet
  • Download the required datasets, dataframes and maybe some of the checkpoints (see Dataset Download section)

Required Data

Pretrained models

Not required to run the network but for some experiments / trainings:

Datasets

Download the datasets and place the additional .pkls in the appropriate folders. Update the paths in experiment_path_helper.py and execute one of the experiments in training/. You might need some intermediate weights if you do not start with experiment_pedrec_2d! You can find them at https://dennisnotes.com/files/pedrec/single_results/filename.pth.

Demo files

Installation tips

Currently I would recommend to use a PIP environment instead of Anaconda. I tried the (recommended) Anaconda environment for PyTorch various times, but the performance is hugely inferior to the PIP environment on my system(s). Using Anaconda I get about 9FPS on videos with a single human compared to 25FPS on my PIP environment. One thing I noticed is that the performance difference shrinks the more people are in a video, thus with 7+ people the performance of the Anaconda and the PIP environment are almost equal. If someone has an idea what the problem could be, please notify me. Things tested:

  • CUDA / CUDNN are working enabled and recognized by PyTorch on both environments
  • Pillow-SIMD installed
  • Usage of opencv-contrib-python-headless instead of the Conda version.

Demo / Run

Check out the demo_actionrec_dev.py file. It contains examples on how to run the application on videos, image dirs, images and a webcam via the "input providers". Example (if you've downloaded the demo videos!):

python pedrec/demo_actionrec_dev.py

Generate own training data

Check out the panda dataframes (e.g. the rt_conti_01_train_FIN.pkl from SIM-C01 dataset, or the pkls from the H36M dataset). If you provide a dataset of the same structure you can just use the pedrec dataset class. You can find some scripts I used to generate the dataframes in tools/datasets/... but I have not tested them in a while. The same applies for EHPI3D action recognition data: Check out the dataframes from the rt_conti_01_train_FIN.pkl file! You might want to checkout the notebook dataset_rtsim_conti01_ehpi as well. You can find the result files (e.g. the C01F_train_pred_df_experiment_pedrec_p2d3d_c_o_h36m_sim_mebow_0_allframes.pkl) at https://dennisnotes.com/files/pedrec/result_dfs/filename.

Notebooks

I've just pasted a few of my notebooks in the notebooks folder. They are not cleaned up and may contain absolute paths etc. but maybe they help the one or other to understand some concepts / validation results.

Appendix

Note: probably outdated information! Need to recheck this part.

Numpy "Datatypes"

note: not really datatypes, those types are stored in numpy arrays due to performance considerations. There are helper methods providing more userfriendly access to those values (e.g. joint_helper(_3d), bb_helper). Those datatypes are the ones used internally in the PedRecNet application, there might be differences in types used in e.g. datasets etc.

"Datatype" name Shape
bb_2d center_x, center_y, width, height, confidence, class_idx
joint_2d x, y, confidence
joint_3d x, y, z, confidence

Expected shapes of PedRecNet HDF5 dataset files

note: n = dataset length

dataset name Shape DType Description
img_paths (n) str img path, relative to the dataset root
joints2d (n,17,4) float32 17 = joints, 4 = x, y, confidence, visibility (coordinates in pixels, starting from top left of the image)
skeleton_3d_hip_normalized (n,17,5) float32 17 = joints, 5 = x, y, z, confidence, visibility (coordinates in mm)
env_position (n,3) float32 3 = x, y, z (mm)
body_orientation (n,4) float32 4 = theta, phi, confidence, visibility
head_orientation (n,4) float32 4 = theta, phi, confidence, visibility
bbs (n,6) float32 5 = center_x, center_y, width, height, confidence, class_idx
scene_idx_range (n,2) uint32 2 = scene_idx_start, scene_idx_stop the index range in the hdf5 file containing data from the same scene
actions (n) uint32 List = dynamic sized list of action ids, e.g. [[1, 2], [3, 4, 5]]
movements (n) uint32 ids, see constants for ID <-> NAME mapping
movement_speeds (n) uint32 ids, see constants for ID <-> NAME mapping
genders (n) uint32 ids, see constants for ID <-> NAME mapping
skin_colors (n) uint32 ids, see constants for ID <-> NAME mapping
sizes (n) uint32 ids, see constants for ID <-> NAME mapping
weights (n) uint32 ids, see constants for ID <-> NAME mapping
ages (n) uint32 ids, see constants for ID <-> NAME mapping
frame_nr_locals (n) uint32 frame number of the current scene
frame_nr_global (n) uint32 frame number of the complete record

Original dataset notes

Some notes to original datasets. Important: Those notes do NOT apply to internal PedRec usage, the original datasets are converted to PedRec Datasets before usage, thus those notes can usually be ignored.

Human3.6M

BB Structure

They use a binary mask containing 1s in the bounding box area.

Joint Order

  • 0 = 'Hips'
  • 1 = 'RightUpLeg'
  • 2 = 'RightLeg'
  • 3 = 'RightFoot'
  • 4 = 'RightToeBase'
  • 5 = 'Site' - ????
  • 6 = 'LeftUpLeg'
  • 7 = 'LeftLeg'
  • 8 = 'LeftFoot'
  • 9 = 'LeftToeBase'
  • 10 = 'Site' - ????
  • 11 = 'Spine'
  • 12 = 'Spine1'
  • 13 = 'Neck'
  • 14 = 'Head'
  • 15 = 'Site'
  • 16 = 'LShoulder'
  • 17 = 'LeftArm'
  • 18 = 'LeftForeArm'
  • 19 = 'LeftHand'
  • 20 = 'LeftHandThumb'
  • 21 = 'Site'
  • 22 = 'L_Wrist_End'
  • 23 = 'Site'
  • 24 = 'RightShoulder'
  • 25 = 'RightArm'
  • 26 = 'RightForeArm'
  • 27 = 'RightHand'
  • 28 = 'LeftHandThumb'
  • 29 = 'Site'
  • 30 = 'L_Wrist_End'
  • 31 = 'Site'

Attributions

Icons

  • Skeleton by Wolf Böse from the Noun Project
  • Head by Naveen from the Noun Project
  • body by Makarenko Andrey from the Noun Project
  • Eye by Simon Sim from the Noun Project
  • jogging by Adrien Coquet from the Noun Project
  • Walk by Adrien Coquet from the Noun Project
  • stand by Gan Khoon Lay from the Noun Project
  • sit by Adrien Coquet from the Noun Project

Contact

  • Dennis Burgermeister, Cognitive Systems Research Group, Reutlingen University (no longer active)
  • Cristóbal Curio, Cognitive Systems Research Group, Reutlingen University

Acknowledgment

This project was funded by the Continental AG.