The code can work normally, but the error between the estimated value and the real value is particularly large.

Question

The code can work normally, but the error between the estimated value and the real value is particularly large.

liaoyu1992 opened this issue 6 years ago · 10 comments

liaoyu1992 commented 6 years ago

Here is the actual value and the estimated value.

the error between the estimated value and the real value is particularly large.
I checked and debugged the code,the code static bool readPTZFeatureLocationAndDescriptors in btdtr_ptz_util.cpp,the keypoint_data and descriptor_data and ptz_data are Eigen::MatrixXf,Because the accuracy is not enough, the error after reading the decimal places is very large.I guess this is one of the reasons, are there any other suggestions?
thank you very much.

Answer 1 · 2019-02-02T02:59:23.000Z

The result is worse than expected. Here is main step of the program

Extract sift feature and save to .mat file (prepare_train_data_ptz.m). Make sure the features are from the background, not from players.
Train a random forest and predict the pan/tilt (btdtr_ptz_test_soccer.cpp). At that step, you can check the precision of the predicted pan/tilt/zoom values for each sift feature. There should be enough inliers.
Estimate pan-tilt-zoom value of the camera.
I suspect there is something wrong in step 1 and 2.
Eigen::MatrixXf is not a problem because 'float' is accurate enough for SIFT.
Hope that helps.

Answer 2 · 2019-02-03T02:07:41.000Z

when extracting sift feature and save to .mat file (prepare_train_data_ptz.m).
As shown in the figure, there are a lot of feature points in the auditorium, but the characteristics of the stadium are particularly small. Is this correct?

and,how big a data set is best for training？Rich lens or single lens？What is the best value of the training parameter in BTDTRTreeParameter?

Answer 3 · 2019-02-08T06:46:40.000Z

From this image, the feature point is good enough because the random forests and RANSAC will remove some outliers (the one in the auditorium).
If you use the world cup dataset, you can matching the image using their pan angles because similar pan angle images have large overlap so that more good feature matches are kept as training examples.
The training parameter is here:
sampled_frame_num 15
pp_x 640
pp_y 360
tree_num 5
max_tree_depth 20
max_balanced_depth 4
max_sample_num 1000
min_leaf_node 1
min_split_node 1
candidate_dim_num 6
candidate_threshold_num 10
min_split_node_std_dev 0.1
verbose 0
verbose_leaf 0

The training set is randomly sampled 50% images from two games (BRA vs. MEX and BRA vs. NED). The other 50% is used as testing data.
Hope that helps.

Answer 4 · 2019-02-11T02:27:40.000Z

according to your advice,i modified the dataset and parameters,the result is good enough,only one is worse than expected. Is this a right result?

A new uncalibrated picture, how to predict its ptz?
thank you!

Answer 5 · 2019-02-12T19:32:58.000Z

I assume the table shows the ground truth PTZ and estimated PTZ. It looks good.
For a new image, the image should be from the same stadium and the same PTZ camera of the training set.

extract SIFT features and save as .mat file
use btdtr_ptz_test_soccer.cpp to predict PTZ.
You can also modify the source code to make it easy.
If the new image is from another stadium or another camera, this method does not work.

Answer 6 · 2019-02-15T05:30:13.000Z

When extracting SIFT features,what do I need to pay attention to?
I extracted SIFT features and saved as .mat file,then used btdtr_ptz_test_soccer.cpp to predict PTZ.
Here is the result,all results are wrong:

are there any other suggestions?

Answer 7 · 2019-02-16T03:27:50.000Z

The algorithm randomly selects two point feature and to get an initial PTZ cameras. It will generate many hypotheses. From the output, it looks like all hypotheses failed. The error is because point feature matching has too much outliers. You can draw these points in the image to verify their locations, for example, on the static background objects.

Answer 8 · 2019-02-19T06:52:47.000Z

I tested the full match vedeo that World Cup 2014: Brazil vs. Netherlands.
I found that the pictures with more auditoriums,like this:

More feature points are obtained, and good results.
the pictures with more football field,like this:

Get fewer feature points and worse results.

Answer 9 · 2019-02-20T04:24:35.000Z

It is expected because the second image has fewer point feature. If you want to get the camera parameter of the whole video, an alternative way is to use PTZ SLAM https://github.com/lood339/Pan-tilt-zoom-SLAM .
This is an unfinished project and I am still working on that.

Answer 10 · 2019-02-20T05:13:32.000Z

ok,thank you!!