kentechx/pointnext

Question regarding X and XYZ

Closed this issue · 10 comments

Hello,

What does X correspond to if we input XYZ coordinates? It is not entirely clear what X will achieve.

Thank you.

X is the feature of points, consisting of XYZ, normals, colors, etc., whereas XYZ coordinates are used for sampling.

OK, thank you so much!

I assume that I have to make sure the data, for example ModelNet40 is uniformly downsampled to 1024 points before I input it to the model, otherwise I will have an error.

Question - then I assume that the neural network will also perform its own FPS to downsample the data? Once before I feed it the ModelNet40 and once in the neural network path? It seems a bit redudant to do that?

How can I send it ModelNet40 and let the model itself downsample it accordingly?

Thank you.

Btw, I am new to coding so I am trying to learn a lot of course. :) It seems that I had to adapt the code a bit (tensor data ptr had to be used or something like that) and create a new file too that would wrap them with PYBIND11_MODULE, otherwise it could complain a lot.

#include <pybind11/pybind11.h>
#include <torch/extension.h>

// Declarations of functions from different source files
extern void ball_query_wrapper_fast(int64_t b, int64_t n, int64_t m, double radius, int64_t nsample, at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor idx_tensor);

extern void group_points_grad_wrapper_fast(int64_t b, int64_t c, int64_t n, int64_t npoints, int64_t nsample, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor grad_points_tensor);
extern void group_points_wrapper_fast(int64_t b, int64_t c, int64_t n, int64_t npoints, int64_t nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor);

extern void three_nn_wrapper_fast(int64_t b, int64_t n, int64_t m, at::Tensor unknown_tensor, at::Tensor known_tensor, at::Tensor dist2_tensor, at::Tensor idx_tensor);
extern void three_interpolate_wrapper_fast(int64_t b, int64_t c, int64_t m, int64_t n, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor out_tensor);
extern void three_interpolate_grad_wrapper_fast(int64_t b, int64_t c, int64_t n, int64_t m, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor grad_points_tensor);

extern void gather_points_wrapper_fast(int64_t b, int64_t c, int64_t n, int64_t npoints, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor);
extern void gather_points_grad_wrapper_fast(int64_t b, int64_t c, int64_t n, int64_t npoints, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor grad_points_tensor);
extern void furthest_point_sampling_wrapper(int64_t b, int64_t n, int64_t m, at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor);

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("ball_query_wrapper_fast", &ball_query_wrapper_fast, "Fast ball query wrapper");
m.def("group_points_grad_wrapper_fast", &group_points_grad_wrapper_fast, "Fast group points gradient wrapper");
m.def("group_points_wrapper_fast", &group_points_wrapper_fast, "Fast group points wrapper");

m.def("three_nn_wrapper_fast", &three_nn_wrapper_fast, "Three nearest neighbors wrapper fast");
m.def("three_interpolate_wrapper_fast", &three_interpolate_wrapper_fast, "Three interpolate wrapper fast");
m.def("three_interpolate_grad_wrapper_fast", &three_interpolate_grad_wrapper_fast, "Three interpolate gradient wrapper fast");

m.def("gather_points_wrapper_fast", &gather_points_wrapper_fast, "Gather points wrapper fast");
m.def("gather_points_grad_wrapper_fast", &gather_points_grad_wrapper_fast, "Gather points gradient wrapper fast");
m.def("furthest_point_sampling_wrapper", &furthest_point_sampling_wrapper, "Furthest point sampling wrapper");

}

I am using latest pytorch and CUDA 12.1 for this.

Downsampling ModelNet40 data to 1024 points should be part of the data preparation process, a step not covered in this repository. The network will downsample the input data four times in one run, with a default stride of 4. For example, input with 1024 points will be successively downsampled to 256, 64, 16, and 4 points.

Regarding the compiling issue, there's no need for additional files. Have you installed the package using pip install pointnexst? Alternatively, you can build the CUDA ops by executing pip install . -e after cloning the repository.

Hello kentechx,

That is correct. That is what I did; pip install pointnext but it was complaining about some tensor data being decrepitated so I had to replace the tensor data to tensor data ptr something.

When I then changed it to the expected tensor data ptr, it was then complaining about wrapper error. I had to create the wrapper file and then it worked just fine. I am using Windows. In case you need the files for review, I can attach them of course. :)

Hello kentechx,

That is correct. That is what I did; pip install pointnext but it was complaining about some tensor data being decrepitated so I had to replace the tensor data to tensor data ptr something.

When I then changed it to the expected tensor data ptr, it was then complaining about wrapper error. I had to create the wrapper file and then it worked just fine. I am using Windows. In case you need the files for review, I can attach them of course. :)

I haven't tested it in Windows yet. And, sure, you can make a PR for the files. Thanks!

Ok, will do it soon.

Second question - if I have, for example several .xyz file that contains XYZ, RGB and normals, do I split up the content so I feed as follows?

XYZ - XYZ coordinates
X - RGB + Normals (I guess I do not feed XYZ here too)?

For labeling, would I also be feeding categories into X?

Thank you!

You should feed xyz into x. Otherwise, the network will not consume xyz as the input feature. Here's an example.

import torch
from pointnext import PointNext, pointnext_s

encoder = pointnext_s(in_dim=6)
model = PointNext(40, encoder=encoder).cuda()

xyz = torch.randn(2, 3, 1024).cuda()
rgb = torch.rand(2, 3, 1024).cuda()
x = torch.cat([xyz, rgb], dim=1)
out = model(x, xyz)

Don't feed categories into x when training. Categories are only needed in part segmentation. In part segmentation, input the category as the 3rt parameter, as shown in the code example in the README.

Hey,

First of all thanks for the great implementation !
Hopping on the back of this question. In the above example, I guess 2 represents the batch_size, 40 the number of classes to be predicted, then the output would be [2, 40, 4].

It is unclear to me how this can be used then for classification with 40 classes. Does one need to a apply the a Linear layer to get [2, 40] or am I missing something ?

Sorry for the layman's question, I'm new to this area of graph neural networks.

Hey,

First of all thanks for the great implementation ! Hopping on the back of this question. In the above example, I guess 2 represents the batch_size, 40 the number of classes to be predicted, then the output would be [2, 40, 4].

It is unclear to me how this can be used then for classification with 40 classes. Does one need to a apply the a Linear layer to get [2, 40] or am I missing something ?

Sorry for the layman's question, I'm new to this area of graph neural networks.

Oh, it's a mistake. For classification, PointNext(40, encoder=encoder) only returns the backbone (for downsampling the input point cloud to a lower resolution and extracting features). You will need to write the head yourself. Here's an example:

import torch
import torch.nn as nn
from pointnext import PointNext, pointnext_s


class Model(nn.Module):

    def __init__(self, in_dim=6, out_dim=40, dropout=0.5):
        super().__init__()

        encoder = pointnext_s(in_dim=in_dim)
        self.backbone = PointNext(1024, encoder=encoder)

        self.norm = nn.BatchNorm1d(1024)
        self.act = nn.ReLU()
        self.cls_head = nn.Sequential(
            nn.Linear(1024, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(512, 256),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(256, out_dim),
        )

    def forward(self, x, xyz):
        out = self.norm(self.backbone(x, xyz))
        out = out.mean(dim=-1)
        out = self.act(out)
        out = self.cls_head(out)
        return out


model = Model(in_dim=6, out_dim=40).cuda()
xyz = torch.randn(2, 3, 1024).cuda()
rgb = torch.rand(2, 3, 1024).cuda()
x = torch.cat([xyz, rgb], dim=1)
out = model(x, xyz)
print(out.shape)  # (2, 40)

I see. I had a similar implementation but without the normalization and mean and the model was constantly predicting the same class while hitting NaN loss.

Now it works like a magic and makes much more sense.
Thank you so much for taking the time to respond !