Sharing raw data of ETH3D and KITTI

Question

Sharing raw data of ETH3D and KITTI

Opened this issue 2 years ago · 13 comments

Hi everyone:

I'd like to share the raw output from COTR for ETH3D and KITTI dataset.

ETH3D eval: https://drive.google.com/file/d/1pfAuHRK7FvB6Hc9Rru-beH6F-2lpZAk6/view?usp=sharing

KITTI: https://drive.google.com/file/d/1SiN5UbqautqosUCInQN2WhyxbRcbWt8b/view?usp=sharing

The format is: {src_id}->{tgt_id}.npy, and I saved the results as a dictionary. There are several keys: "raw_corr", "drifting_forward", and "drifting_backward".
"raw_corr" is the raw sparse correspondences in XYXY format, and "drifting_forward", "drifting_backward" are used to the masks to filter out drifted predictions.

Answer 1 · 2022-07-18T21:20:04.000Z

Some tables:

With random keypoints, we found cycle consistency is quite useful to obtain accurate matches. We rerun the experiments with FasterSparseEngine on ETH3D, and the performance slightly dropped.

Answer 2 · 2022-07-18T21:21:12.000Z

Hi Wei,
Thank you for this!
Would it be possible to provide the evaluation code for the same?
I was wondering what would be the parameters for the forward call of the model.
When providing both the concatenated image pair and the queries as the params, what would be the queries during evaluation using HPatches or ETH3D?
I am using the evaluation protocol followed by GLU-Net as advised in the paper.

Answer 3 · 2022-07-18T21:36:50.000Z

Evaluation code: Evaluation for ETH3D and KITTI should be straightforward, there should be one to one GT correspondences between image pairs.
For HPatches, you need to correctly handle the scale ratio difference, i.e. GLU-Net takes in two images at same resolution, but 5 images in one HPatches sequence may have different aspect ratios.
Query sampling: For sampling, I believe we sampled the pixels with "valid correspondences/depth" in the query images.
Hyper params: we used stretch mode for ETH3D and KITTI, also set zoom_ins=np.logspace(np.log10(0.5), np.log10(0.0625), num=4)

Answer 4 · 2022-07-19T20:02:57.000Z

Hi Wei,
I was able to use the npy files that you have posted to evaluate the model. Thanks a lot for that!
Quick question: What is the scaling factor that you use to obtain the correspondences from the network? (The outputs are in the range 0->1)

Answer 5 · 2022-07-19T20:58:49.000Z

Hi Vikram,
The raw output of the network is [0, 1], and the input image tuple is at 256x256 resolution. In this case, [0, 1] == [0, height] in target image, and [0.5, 1] == [0, width] in target image.
The inference engine should take care of this conversion, and output the pixel coordinates.

Answer 6 · 2022-07-19T22:26:02.000Z

Thanks Wei! I was able to get the predicted correspondences.
The output from cotr_corr_multiscale is of shape (max_corrs, 4); so as per my understanding, out of the available ground truth correspondences, you choose 1000 random query points and then filter out the best ones from that right?
So the output shape of (max_corrs, 4) would mean the first two columns are ground truth corrs and the last 2 are the predicted ones?

Answer 7 · 2022-07-20T18:55:15.000Z

Hi Wei, another follow up question:
During evaluation, I also tried the cotr_corr_base function, the AEPE values obtained are above 10,000 between targets and the inputs.
The following is the call: cotr_corr_base(network, source_img.numpy(), target_img.numpy(), queries), where queries are loaded from the .npy provided by you for ETH3D.
And AEPE is calculated between the last two columns of the .npy(targets) file read and the last two columns of the obtained estimates from the network.
Please let me know if the approach followed is right.

Thanks!

Answer 8 · 2022-07-20T23:00:13.000Z

Hi Vikram,
Yes, your understanding about cotr_corr_multiscale is correct.
cotr_corr_base uses 256x256 resolution image to provide the initial correspondences for the image pair, maybe there are too many outliers? cotr_corr_base doesn't use the zoom-in strategy. Or can you share some correspondences visualization from cotr_corr_base?

Answer 9 · 2022-07-20T23:25:51.000Z

Hi Wei,
You are correct, since cotr_corr_base uses 256x256, some correspondences are outside the image space as well:

How do I rectify this issue?
Also, the AEPE and PcK calculations reported in the paper are calculated using cotr_corr_multiscale or cotr_corr_base ?
Is there a possibility of releasing the code for evaluation as well?
Thanks!

Answer 10 · 2022-07-22T03:04:39.000Z

hmmmm, I can see on the left the correspondences are from top to bottom, but there are a lot of horizontal correspondences which seems to be not making sense to me...

It seems that cotr_corr_base already took care of the coordinate conversion?
Here is a run from my side

code:

x, y = np.meshgrid(
      np.linspace(0, img_a.shape[1] - 1, num=10),
      np.linspace(0, img_a.shape[0] - 1, num=10),
)
queries_a = np.concatenate([np.expand_dims(x, 2), np.expand_dims(y, 2)], axis=2).reshape(-1, 2)
corrs = cotr_corr_base(model, img_a, img_b, queries_a)

We used cotr_corr_multiscale for the numbers reported in the paper.

Answer 11 · 2023-07-02T16:25:29.000Z

Hi, Great work and excellent results.
It seems that you release the KITTI results for noc parts? So, does this mean that the results in your article about KITTI are also based on the noc dataset?
In addition to that, can you tell me if the interpolation results in your article are interpolated by taking grid points and then triangulating them or by taking random points and then triangulating them?Is there anything to be said about sampling?
Also, your article mentions matches filtering, what is the principle of filtering? I ran KITTI experiments directly from your open source raw data, and it seems difficult to get the results in your article.
We are following your work, so we would like to get a more complete replication of the results.

Answer 12 · 2023-07-02T16:46:17.000Z

Hi, thanks for your interests in our work! I don't have access to most experiment data anymore, so I have to rely on my memory.

occ/noc: Yes, I think we used noc split. Because COTR by design can not handle occluded correspondences. Edit: I just found some data in a google sheet:
For the RAFT row, maybe I have evaluated on the mask we provided.
Sampling: No we don't use grid, we use random points. The sampling is complete random, no other tricks.
Principle of filtering: we want the correspondence to be consistent across different zoom-ins.

Another work you may find interesting is ECO-TR: https://dltan7.github.io/ecotr/

Answer 13 · 2023-07-02T19:17:18.000Z

Thank you very much!
You provide all of the information I need !