mks0601/3DMPPE_POSENET_RELEASE

Inference queries

abhigoku10 opened this issue · 5 comments

@wangzheallen @mks0601 hi thanks for the wonderful code base, this is what I was looking for this but I have a few queries

  1. Detectnet -> used for obtaining bounding box ; PoseNet-> used for pose estimation ; RootNet-> used for depth localization . Can we replace detectnet = yolov5 and posenet = movenet ie other pose estimation model and yet get results if they are able to detect properly
  2. What is the inference time on a single image for a given resolution
  3. In your demo.py file the box list and root list is hardcode for a single image, can we have a pipeline which given a video will process each frame and obtain the results dynamically
    Thanks for queries
  1. Sure
  2. Depends on machines. On GTX 2080 Ti, it runs over 50 fps.
  3. Sure, but I just provided a hard-coded version due to the lack of time :( sorry for inconvenience.

@mks0601 thanks for the response , i am having followup questions
1.So in paper you mentioned " Third, a root-relative 3D single-person pose estimation network (PoseNet) estimates the root-relative 3D pose for each detected human" so this means that PoseNet is trained with additional data of Rootnet ? is my understanding correct
2. 50fps for image resolution of ?
3. No issues trying to understand the code aspects

  1. PosNet is trained with GT root-relative 3D pose. Does not require RootNet outputs.
  2. 256x256

@mks0601 thanks for the response

  1. i am still not able to understand the topic , does PoseNet which is trained using Human3.6 dataset which has depth data, is it giving output depth / z info from the model ? if so then what is the use of having Rootnet in the pipeline
  2. Thanks
  1. RootNet is only used during test stage.