Inference queries
abhigoku10 opened this issue · 5 comments
abhigoku10 commented
@wangzheallen @mks0601 hi thanks for the wonderful code base, this is what I was looking for this but I have a few queries
- Detectnet -> used for obtaining bounding box ; PoseNet-> used for pose estimation ; RootNet-> used for depth localization . Can we replace detectnet = yolov5 and posenet = movenet ie other pose estimation model and yet get results if they are able to detect properly
- What is the inference time on a single image for a given resolution
- In your demo.py file the box list and root list is hardcode for a single image, can we have a pipeline which given a video will process each frame and obtain the results dynamically
Thanks for queries
mks0601 commented
- Sure
- Depends on machines. On GTX 2080 Ti, it runs over 50 fps.
- Sure, but I just provided a hard-coded version due to the lack of time :( sorry for inconvenience.
abhigoku10 commented
@mks0601 thanks for the response , i am having followup questions
1.So in paper you mentioned " Third, a root-relative 3D single-person pose estimation network (PoseNet) estimates the root-relative 3D pose for each detected human" so this means that PoseNet is trained with additional data of Rootnet ? is my understanding correct
2. 50fps for image resolution of ?
3. No issues trying to understand the code aspects
mks0601 commented
- PosNet is trained with GT root-relative 3D pose. Does not require RootNet outputs.
- 256x256
abhigoku10 commented
@mks0601 thanks for the response
- i am still not able to understand the topic , does PoseNet which is trained using Human3.6 dataset which has depth data, is it giving output depth / z info from the model ? if so then what is the use of having Rootnet in the pipeline
- Thanks
mks0601 commented
- RootNet is only used during test stage.