Using CED detector for 3D-to-2D mapping correspondences
ttsesm opened this issue · 5 comments
Hi @hanzheteng,
Thanks for sharing your work, I have a question though. Is it possible to use CED to extract and map keypoints correspondences both in a 3D point cloud and a corresponding 2D image. So in principle I would be interested to apply a 3D-2D mapping.
Thanks.
Yes, possibly. Keypoint detector is the very beginning step; what follows are feature descriptors and some method to establish the correspondences. These are problem-specific methods.
Depending on your use case and what other prior knowledge you have, you can propose a descriptor for correspondence matching in both.
For example, if your 3D point cloud is generated from 2D images + depth image, you can always project keypoints back to 2D images. If you can provide more info/context, I can provide some thoughts.
Interesting, thanks for the response.
Well my use case is the following, I have a point cloud and for each point I have the x,y,z coordinates, and for each point I also have the direction vectors, e.i. rays, of 27 different rays starting from each point. Now for each ray I also do have the rgb values. So in principle for each 3D point I have 27 different colors associated to it corresponding to one of the 27 different ray. Then I also have my 2D rgb image which corresponds to the point cloud but this image was not used to create the point cloud. Similar images were used to create this image from a NeRF model in case you are familiar with the research in neural radiance fields.
My concept is similar to the image bellow:
Now my idea was to create feature descriptor for each one of the 3D points which includes, x,y,z and color and each one of my image pixel values and try to match them so that I can get keypoints which will give me based on the color association the corresponding rays that the colors were matched.
Does this makes, sense, and whether it is possible to create something like that with CED. In principle getting some point matching could be useful in any case, if I could optimize it by considering extract information like the rays info I think that would be ideal.
Let me know what you think about it.
Update
Bellow you can also see the color distribution in the 3D space of both the 2D image pixel values and the point cloud:
in case that it makes of any help, in principle I could try to apply CED on both these two point clouds but I am not sure whether this would make any sense.
An interesting perspective. Well, I think the design of this feature descriptor can be highly dependent on the relation/association between the 2D image and the corresponding 3D point cloud you have. I know the rough idea of NeRF but not sure the exact network model you used in the generation of point clouds etc. Some design can really be inspired from these network details.
My understanding of the problem formulation you currently have:
Given a bunch of images and point clouds generated from a variety of these images (perhaps one point cloud generated from multiple images from different perspectives), you want to find out which image corresponds to which point cloud. Though these images are used to generate point clouds, their correspondences to each point cloud is unknown, right? Alternatively, you may have used the images in the training sets to generate these point clouds, and now want to localize each image in the testing set to each point cloud.
Well, in this case, what CED detector can help you is to just generated a bunch of keypoints in the point cloud, and you may need to apply some other detectors such as ORB to extract keypoints on images. What follows could be applying feature descriptors on both ends (you can use PFHRGB for CED, and ORB has its own descriptor), and figure out a way to associate these two descriptors. This could be a simply Hamming distance measurement. Alternatively, this could be a learning task for neural networks, if enough data is provided, and this association can be learned.
Sorry that I didn't get it for these two figures. Are the axes x, y, z for color space (normalized in 0-1) or real 3D world?
Alternatively, you may have used the images in the training sets to generate these point clouds, and now want to localize each image in the testing set to each point cloud.
Yes you understood this is what I want to achieve. In practice I have one point cloud from multiple images, now this point cloud is associated with multiple color values as many as my rays (as visualized in the first image, which can also interpreted as multiple views or images). Ideally rays with a coherent direction should be pointing to the same image. But the only information that I have is the xyz coordinates per point, the RGB color per direction (ray) and my target RGB color of my input image. So I need to map a 3D point with the correct RGB color value to a 2D point with the approximate corresponding color and the rays should coherent and pointing more or less to the camera direction. In this way, I can recover the camera pose based on PnP.
Well, in this case, what CED detector can help you is to just generated a bunch of keypoints in the point cloud, and you may need to apply some other detectors such as ORB to extract keypoints on images. What follows could be applying feature descriptors on both ends (you can use PFHRGB for CED, and ORB has its own descriptor), and figure out a way to associate these two descriptors. This could be a simply Hamming distance measurement. Alternatively, this could be a learning task for neural networks, if enough data is provided, and this association can be learned.
Yup, this is one of the ideas that I have and I am trying to understand whether it can work or not.
Sorry that I didn't get it for these two figures. Are the axes x, y, z for color space (normalized in 0-1) or real 3D world?
The last two images represent the color distribution of the 3D points from the point cloud (Rays RGB) and the color distribution of my pixel values (Image RGB) plotted in the 3D space. It is as simple as taking the RGB values and transforming them to XYZ values, i.e. R -> X, G -> Y and B -> Z and then each of these 3D points will have the corresponding RGB values as their color.
For the last two figures, yes now I get it. They are essentially the visualization of the RGB color space of points in images and in point clouds respectively. The figure on the right side has more points and color varieties because it contains color from multiple rays. I was confused about "3D space" because I usually take it as 3D coordinates in the real world.
Your idea and tentative approach sounds good to me. Whether it can go through, well, this is a research problem, hard to say. (It wouldn't be worth researching if it is easy.) Best of luck.