[CV_3D] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Question

[CV_3D] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

jeonggg119 opened this issue 2 years ago · 0 comments

jeonggg119 commented 2 years ago

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Paper Review

1. Introduction

PointNet : learning a spatial encoding of each point → (max-pooling) aggregating all point features to global PC (local features X)
PointNet++ : processing a set of points sampled in metric space in a hierarchical fashion
partitioning a set of points into overlapping local regions
→ extracting local features capturing fine geometric structures from small neighborhoods
→ grouping local features into larger unit and processing to produce higher level features

[ Two issues of the design of PointNet++ ]

1. How to generate overlapping partitioning of point set

Each partition : a neighborhood ball in Euclidean space
- Centroid Location : FPS(Farthest Point Sampling)로 선택
- Scale : combined Multiple scales for both robustness and detail capture (Random input dropout)

2. How to abstract sets of points or local features through a local feature learner (=PointNet)

PointNet : processing an unordered set of points for semantic feature extraction & robust to input data corruption
PointNet++ : applying PointNet recursively on a nested partitioning of input set

2. Problem Statement

$X = (M, d)$ : discrete metric space, metric = Euclidean space $R^n$
- $M$ : set of points (density of $M$ is not uniform)
- $d$ : distance metric
$f$ : set functions = classification or segmentation function
- Input : $X$ (along with additional features for each point)
- Output : information of semantic interest regarding $X$
- classification function : to assign a label to $X$
- segmentation function : to assign a per point label to each member of $M$

3. Method

3.1 Review of PointNet : A Universal Continuous Set Function Approximator

Point Cloud : a set of sparse points => efficient But operation for permutation-invariant 필수
PointNet : single MAX pooling → PC의 global feature 추출 But local context 소실 (segmentation performance ↓)
- $f$ : permutation-invariant set function → arbitrarily approximate any continuous set function

3.2 Hierarchical Point Set Feature Learning (Set Abstraction)

PointNet++ : hierarchical grouping of points and progressively abstracting larger local regions
Set Abstraction level (3 layers) : 전반적인 semantic 정보를 포함한 압축된 PC로 변환 → PC의 local feature 추출
Input : $N$ x ( $d$ + $C$ ) matrix ..... $N$ points with $d$-dim coordinates + $C$-dim point feature
Output : $N'$ x ( $d$ + $C'$ ) matrix ..... $N'$ subsampled points with $d$-dim coordinates + new $C'$-dim feature vectors

In Paper, $d$ = 3 → (x,y,z)

[ 3 layers ]

❶ Sampling layer

Sampling layer : Selecting a set of points from input points { ${x_1, x_2, ..., x_n}$ }
..... $N$ input points 중 $N'$ centroids 선택 (대표성 + local한 공간의 center)
Farthest Point Sampling (FPS)
- Centroid = the most distant point in metric(euclidean) distance w.r.t the rest points
- Better converge of the entire point set than Random Sampling

❷ Grouping layer

Grouping layer : 각 centroid 대한 neighbor points 찾기 → 묶어서 하나의 local region point set 구성
- Input : a point set = $N$ x ( $d$ + $C$ ) & coordinates of a set of centroids = $N'$ x $d$
- Output : local groups of point sets = $N'$ x $K$ x ( $d$ + $C$ ) ..... $K$ : # of neighbor points of centroid points
  $K$ : flexible # (group마다 다름) → PointNet layer에서 fixed length local region feature vector 1개씩 추출
Metric distances to define neighbor points
- 1. KNN : centroid 대해 가장 가까운 $K$개의 점들 (fixed number of neighbor points)
- 1. Ball query : centroid 기준 반지름 r 이내의 점들 (fixed region scale) → more generalizable
In Paper, using Ball query method

def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
    new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
    if knn:
        _,idx = knn_point(nsample, xyz, new_xyz)
    else:
        idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
    grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
    grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
    if points is not None:
        grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
        if use_xyz:
            new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
        else:
            new_points = grouped_points
    else:
        new_points = grouped_xyz

    return new_xyz, new_points, idx, grouped_xyz

❸ PointNet layer

PointNet layer : Each local region points pattern 파악 (encoding) → local feature vector 1개씩 추출
- Input : $N'$ local regions of points with data size $N'$ x $K$ x ( $d$ + $C$ )
- Output : $N'$ x ( $d$ + $C'$ )
Mini-PointNet = basic building block for local pattern learning

def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
    data_format = 'NCHW' if use_nchw else 'NHWC'
    with tf.variable_scope(scope) as sc:
        # Sample and Grouping
        if group_all:
            nsample = xyz.get_shape()[1].value
            new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
        else:
            new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)

        # Point Feature Embedding
        if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
        for i, num_out_channel in enumerate(mlp):
            new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
                                        padding='VALID', stride=[1,1],
                                        bn=bn, is_training=is_training,
                                        scope='conv%d'%(i), bn_decay=bn_decay,
                                        data_format=data_format) 
        if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])

        # Pooling in Local Regions
        if pooling=='max':
            new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
        elif pooling=='avg':
            new_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
        elif pooling=='weighted_avg':
            with tf.variable_scope('weighted_avg'):
                dists = tf.norm(grouped_xyz,axis=-1,ord=2,keep_dims=True)
                exp_dists = tf.exp(-dists * 5)
                weights = exp_dists/tf.reduce_sum(exp_dists,axis=2,keep_dims=True) # (batch_size, npoint, nsample, 1)
                new_points *= weights # (batch_size, npoint, nsample, mlp[-1])
                new_points = tf.reduce_sum(new_points, axis=2, keep_dims=True)
        elif pooling=='max_and_avg':
            max_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
            avg_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
            new_points = tf.concat([avg_points, max_points], axis=-1)

        new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])
        return new_xyz, new_points, idx

3.3 Robust Feature Learning under Non-Uniform Sampling Density

Goal : non-uniform density (sparse ~ dense) point set feature learning 어려움 해결
(1) PC를 다양한 density로 sampling하여 학습
(2) Density Adaptive layer : 다양한 scale의 PC에서 feature vector 추출하여 결합

[ 2 Types of Density Adaptive layers ]

1. Multi-scale grouping (MSG)

Grouping을 다양한 scale로 여러 번 적용 → 하나의 centroid 대해 여러 scale의 point sets 생성
각 point set에서 추출한 feature vector를 concat하여 multi-scale feature vector 생성
각 point set은 random input dropout (down-sampling) → 다양 scale의 density (various sparsity, varying uniformity)
단점 : every centroid 대해 local PointNet 돌려야함 → computationally expensive, inefficient, time-consuming

2. Multi-resolution grouping (MRG) ★이해

MSG의 단점 보완, PointNet++에서 사용한 방법
$L_i$ level features : 2 different scale feature vectors를 concat하여 multi-scale feature vector
Left vector : lower level $L_{i-1}$의 each sub-region의 features를 summarizing한 feature
Right vector : local region $L_i$의 all raw points에 대해 PointNet을 거쳐서 얻은 feature
장점 : large scale neighborhoods at lowest levels에서의 feature extraction 필요 X → more efficient

3.4 Point Feature Propagation for Set Segmentation

Set Abstraction Sampling layer 의해 PC 크기 감소 → segmentation task 위해 원래 크기 복원
(1) Up-sampling : 이전 점들( $N_{l-1}$ points )에 대한 feature vector로부터 (1/distance)로 weighted Interpolation
(2) Skip connection : down-sampling 이전의 feature vector를 concat → 정보량 보충

4. Experiments

[ Dataset ]

4.1 Point Set Classification in Euclidean Metric Space

MNIST (2D Object)
- Input : 2D img coordinates에서 2D PC of digit pixel locations 로 변환 (default 512 points)
- Result : digit classification task에서 PointNet 보다 error rate ↓, CNN-based models 보다도 성능 ↑
ModelNet40 (3D rigid Object)
- Input : CAD model 3D mesh에서 표면을 sampling하여 3D PC 로 변환 (default 1024 points)
- Additional point features로 face normals 사용 ( $N$ = 5000 ) to boost performance
- All points are normalized to be 0 mean and within a unit (r=1) ball
- Model : 3-level hierarchical network + 3 FC layer
- Result : 3D shape classification task에서 MVCNN(SOTA model) 보다 성능 ↑
- Ablation study of Density adaptive layer : multi-scale 학습 모델들(MSG, MRG) = robust to points # (or density)

def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output Bx40 """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value
    end_points = {}

    l0_xyz = point_cloud
    l0_points = None

    # Set abstraction layers
    l1_xyz, l1_points = pointnet_sa_module_msg(l0_xyz, l0_points, 512, [0.1,0.2,0.4], [16,32,128], [[32,32,64], [64,64,128], [64,96,128]], is_training, bn_decay, scope='layer1', use_nchw=True)
    l2_xyz, l2_points = pointnet_sa_module_msg(l1_xyz, l1_points, 128, [0.2,0.4,0.8], [32,64,128], [[64,64,128], [128,128,256], [128,128,256]], is_training, bn_decay, scope='layer2')
    l3_xyz, l3_points, _ = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3')

    # Fully connected layers
    net = tf.reshape(l3_points, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.4, is_training=is_training, scope='dp1')
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.4, is_training=is_training, scope='dp2')
    net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')

    return net, end_points

4.2 Point Set Segmentation for Semantic Scene Labeling

ScanNet (3D Scene)
- each point에는 해당 point가 어떤 물체에 속해 있는지에 대한 segmentation label 존재
- Result : Segmentation 성능 ↑ => 계층적 구조를 통한 local feature 학습이 다양한 scale의 scene 이해에 중요
- Ablation study : Density adaptive layer 이용하여 non-uniform sampling density로 줄여서 학습 → MRG가 SSG보다 다양한 density 대해 성능 ↑

4.3 Point Set Classification in Non-Euclidean Metric Space

SHREC15 (3D non-rigid Object)
- SHREC15 dataset : 2D surfaces embedded in 3D space
- Goal : To show generalizability of PointNet++ to non-Euclidean space
- Requirement : knowledge of 'intrinsic structure'
- [Fig.7] (a), (c) : different in pose -> same category
- Geodesic distances along the surfaces induce a metric space
  
  Geodesic distance : the shortest path between the vertices in a graph
- PointNet++ : constructing metric space induced by geodesic distance → extracting intrinsic point features in WKS, HKS, multi-scale Gaussian curvature → using these features as input → sampling and grouping points
- Result : capturing multi-scale intrinsic structure not influenced by specific pose => effectiveness, 성능 ↑

4.4 Feature Visualization

Visualization of What has been learned by the 1st level kernels of hierarchical network

Conclusion

Future works

To think how to accelerate inference speed of network for MSG and MRG layers by sharing more computation in each local regions
To find applications in higher dimensional metric spaces where CNN based method would be computationally unfeasible