Camera Trajectory Estimation

Question

Camera Trajectory Estimation

Closed this issue 4 months ago · 2 comments

The paper mentions the use of TRAM to estimate camera trajectories, but the results I obtained using TRAM seem to differ from the camera trajectories you provided. May I ask what method you use to process this?

Taking 1769632-hd_1906_1080_30fps.mp4 as an example:
What you provide:

15.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.000000
16.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.000000
17.000000 -0.000000 -0.000000 -0.000000 -0.000000 -0.000000 1.000000 -0.000000
18.000000 0.016145 0.004314 -0.018424 -0.000078 0.000138 1.000000 0.000025
19.000000 0.016145 0.004314 -0.018424 -0.000078 0.000138 1.000000 0.000025
20.000000 0.016145 0.004314 -0.018424 -0.000078 0.000138 1.000000 0.000025
21.000000 0.016145 0.004314 -0.018424 -0.000078 0.000138 1.000000 0.000025
22.000000 0.041278 0.023090 -0.040322 -0.000129 0.000174 1.000000 0.000087
23.000000 0.041278 0.023090 -0.040322 -0.000129 0.000174 1.000000 0.000087
24.000000 0.053233 0.036312 -0.070936 -0.000163 0.000173 1.000000 0.000098
25.000000 0.053233 0.036312 -0.070936 -0.000163 0.000173 1.000000 0.000098
26.000000 0.053233 0.036312 -0.070936 -0.000163 0.000173 1.000000 0.000098

Calculate the rotation matrix of the tram output as a quaternion using Scipy:

15.000000 0.033756 0.217126 -0.051660 -0.000088 -0.000047 -0.000144 1.000000
16.000000 0.009931 0.136129 0.037638 -0.000053 -0.000036 -0.000033 1.000000
17.000000 0.019529 0.119213 0.038395 -0.000081 0.000001 -0.000015 1.000000
18.000000 0.136388 0.297898 -0.222285 0.000021 -0.000107 0.000124 1.000000
19.000000 0.143447 0.294565 -0.233900 -0.000031 -0.000067 0.000109 1.000000
20.000000 0.097444 0.330990 -0.146267 -0.000132 -0.000011 -0.000065 1.000000
21.000000 0.360561 0.622831 -0.650300 -0.000080 -0.000139 -0.000091 1.000000
22.000000 0.321393 0.650588 -0.583660 0.000054 -0.000156 0.000127 1.000000
23.000000 0.324948 0.650876 -0.585861 0.000003 -0.000123 0.000097 1.000000
24.000000 0.328238 0.666158 -0.586017 -0.000008 -0.000096 0.000124 1.000000
25.000000 0.341013 0.660448 -0.619285 0.000031 -0.000121 0.000164 1.000000
26.000000 0.394795 0.709896 -0.715407 -0.000056 -0.000124 0.000020 1.000000

Answer 1 · 2024-09-19T04:32:10.000Z

We changed the script of tram to enable large scale estimation, instead of its original script for single video demo. The output format is re-formatted to be aligned with original droid-slam or dpvo format. We only adopt the principle of tram, not the exact scripts provided by tram's github repo. More details will be released along with code.

Besides, the current version of camera parameters is dpvo over yolo bounding boxes as masks for accelerating camera estimation process, not the original droid-slam over sam masks in tram's due to its extremely large computational cost. We will try to provided tram's original results in the future, but without a specific date of release.

BTW, if you have extensive computational resources, you could try it on your own. A rough estimation of original tram's computation until the camera parameters are ready (i.e., discard the vimo part for human pose) on our dataset is 3k to 4k gpu hours.

Answer 2 · 2024-09-19T06:36:03.000Z

Thank you for your reply. We look forward to the code release. 😊