jeonggg119/DL_paper

[CV_3D] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

jeonggg119 opened this issue · 0 comments

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Paper Review

1. Introduction

  • PointNet : learning a spatial encoding of each point → (max-pooling) aggregating all point features to global PC (local features X)
  • PointNet++ : processing a set of points sampled in metric space in a hierarchical fashion
    partitioning a set of points into overlapping local regions
    → extracting local features capturing fine geometric structures from small neighborhoods
    → grouping local features into larger unit and processing to produce higher level features

[ Two issues of the design of PointNet++ ]

1. How to generate overlapping partitioning of point set

  • Each partition : a neighborhood ball in Euclidean space
    • Centroid Location : FPS(Farthest Point Sampling)로 선택
    • Scale : combined Multiple scales for both robustness and detail capture (Random input dropout)

2. How to abstract sets of points or local features through a local feature learner (=PointNet)

  • PointNet : processing an unordered set of points for semantic feature extraction & robust to input data corruption
  • PointNet++ : applying PointNet recursively on a nested partitioning of input set

2. Problem Statement

  • $X = (M, d)$ : discrete metric space, metric = Euclidean space $R^n$
    • $M$ : set of points (density of $M$ is not uniform)
    • $d$ : distance metric
  • $f$ : set functions = classification or segmentation function
    • Input : $X$ (along with additional features for each point)
    • Output : information of semantic interest regarding $X$
    • classification function : to assign a label to $X$
    • segmentation function : to assign a per point label to each member of $M$

3. Method

3.1 Review of PointNet : A Universal Continuous Set Function Approximator

  • Point Cloud : a set of sparse points => efficient But operation for permutation-invariant 필수
  • PointNet : single MAX pooling → PC의 global feature 추출 But local context 소실 (segmentation performance ↓)
    image
    • $f$ : permutation-invariant set function → arbitrarily approximate any continuous set function

3.2 Hierarchical Point Set Feature Learning (Set Abstraction)

  • PointNet++ : hierarchical grouping of points and progressively abstracting larger local regions
  • Set Abstraction level (3 layers) : 전반적인 semantic 정보를 포함한 압축된 PC로 변환 → PC의 local feature 추출
  • Input : $N$ x ( $d$ + $C$ ) matrix ..... $N$ points with $d$-dim coordinates + $C$-dim point feature
  • Output : $N'$ x ( $d$ + $C'$ ) matrix ..... $N'$ subsampled points with $d$-dim coordinates + new $C'$-dim feature vectors

In Paper, $d$ = 3 → (x,y,z)

image

[ 3 layers ]

❶ Sampling layer

  • Sampling layer : Selecting a set of points from input points { ${x_1, x_2, ..., x_n}$ }
    ..... $N$ input points 중 $N'$ centroids 선택 (대표성 + local한 공간의 center)
  • Farthest Point Sampling (FPS)
    • Centroid = the most distant point in metric(euclidean) distance w.r.t the rest points
    • Better converge of the entire point set than Random Sampling

❷ Grouping layer

  • Grouping layer : 각 centroid 대한 neighbor points 찾기 → 묶어서 하나의 local region point set 구성
    • Input : a point set = $N$ x ( $d$ + $C$ ) & coordinates of a set of centroids = $N'$ x $d$
    • Output : local groups of point sets = $N'$ x $K$ x ( $d$ + $C$ ) ..... $K$ : # of neighbor points of centroid points
      $K$ : flexible # (group마다 다름) → PointNet layer에서 fixed length local region feature vector 1개씩 추출
  • Metric distances to define neighbor points
      1. KNN : centroid 대해 가장 가까운 $K$개의 점들 (fixed number of neighbor points)
      1. Ball query : centroid 기준 반지름 r 이내의 점들 (fixed region scale) → more generalizable

    In Paper, using Ball query method

def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
    new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
    if knn:
        _,idx = knn_point(nsample, xyz, new_xyz)
    else:
        idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
    grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
    grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
    if points is not None:
        grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
        if use_xyz:
            new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
        else:
            new_points = grouped_points
    else:
        new_points = grouped_xyz

    return new_xyz, new_points, idx, grouped_xyz

❸ PointNet layer

  • PointNet layer : Each local region points pattern 파악 (encoding) → local feature vector 1개씩 추출
    • Input : $N'$ local regions of points with data size $N'$ x $K$ x ( $d$ + $C$ )
    • Output : $N'$ x ( $d$ + $C'$ )
  • Mini-PointNet = basic building block for local pattern learning
def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
    data_format = 'NCHW' if use_nchw else 'NHWC'
    with tf.variable_scope(scope) as sc:
        # Sample and Grouping
        if group_all:
            nsample = xyz.get_shape()[1].value
            new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
        else:
            new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)

        # Point Feature Embedding
        if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
        for i, num_out_channel in enumerate(mlp):
            new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
                                        padding='VALID', stride=[1,1],
                                        bn=bn, is_training=is_training,
                                        scope='conv%d'%(i), bn_decay=bn_decay,
                                        data_format=data_format) 
        if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])

        # Pooling in Local Regions
        if pooling=='max':
            new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
        elif pooling=='avg':
            new_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
        elif pooling=='weighted_avg':
            with tf.variable_scope('weighted_avg'):
                dists = tf.norm(grouped_xyz,axis=-1,ord=2,keep_dims=True)
                exp_dists = tf.exp(-dists * 5)
                weights = exp_dists/tf.reduce_sum(exp_dists,axis=2,keep_dims=True) # (batch_size, npoint, nsample, 1)
                new_points *= weights # (batch_size, npoint, nsample, mlp[-1])
                new_points = tf.reduce_sum(new_points, axis=2, keep_dims=True)
        elif pooling=='max_and_avg':
            max_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
            avg_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
            new_points = tf.concat([avg_points, max_points], axis=-1)

        new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])
        return new_xyz, new_points, idx

3.3 Robust Feature Learning under Non-Uniform Sampling Density

  • Goal : non-uniform density (sparse ~ dense) point set feature learning 어려움 해결
  • (1) PC를 다양한 density로 sampling하여 학습
  • (2) Density Adaptive layer : 다양한 scale의 PC에서 feature vector 추출하여 결합

image

[ 2 Types of Density Adaptive layers ]

1. Multi-scale grouping (MSG)

  • Grouping을 다양한 scale로 여러 번 적용 → 하나의 centroid 대해 여러 scale의 point sets 생성
  • 각 point set에서 추출한 feature vector를 concat하여 multi-scale feature vector 생성
  • 각 point set은 random input dropout (down-sampling) → 다양 scale의 density (various sparsity, varying uniformity)
  • 단점 : every centroid 대해 local PointNet 돌려야함 → computationally expensive, inefficient, time-consuming

2. Multi-resolution grouping (MRG) ★이해

  • MSG의 단점 보완, PointNet++에서 사용한 방법
  • $L_i$ level features : 2 different scale feature vectors를 concat하여 multi-scale feature vector
  • Left vector : lower level $L_{i-1}$의 each sub-region의 features를 summarizing한 feature
  • Right vector : local region $L_i$의 all raw points에 대해 PointNet을 거쳐서 얻은 feature
  • 장점 : large scale neighborhoods at lowest levels에서의 feature extraction 필요 X → more efficient

3.4 Point Feature Propagation for Set Segmentation

  • Set Abstraction Sampling layer 의해 PC 크기 감소 → segmentation task 위해 원래 크기 복원
  • (1) Up-sampling : 이전 점들( $N_{l-1}$ points )에 대한 feature vector로부터 (1/distance)로 weighted Interpolation
  • (2) Skip connection : down-sampling 이전의 feature vector를 concat → 정보량 보충

4. Experiments

[ Dataset ]
image

4.1 Point Set Classification in Euclidean Metric Space

  • MNIST (2D Object)
    image

    • Input : 2D img coordinates에서 2D PC of digit pixel locations 로 변환 (default 512 points)
    • Result : digit classification task에서 PointNet 보다 error rate ↓, CNN-based models 보다도 성능 ↑
  • ModelNet40 (3D rigid Object)
    image

    • Input : CAD model 3D mesh에서 표면을 sampling하여 3D PC 로 변환 (default 1024 points)
    • Additional point features로 face normals 사용 ( $N$ = 5000 ) to boost performance
    • All points are normalized to be 0 mean and within a unit (r=1) ball
    • Model : 3-level hierarchical network + 3 FC layer
    • Result : 3D shape classification task에서 MVCNN(SOTA model) 보다 성능 ↑
    • Ablation study of Density adaptive layer : multi-scale 학습 모델들(MSG, MRG) = robust to points # (or density)
      image
def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output Bx40 """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value
    end_points = {}

    l0_xyz = point_cloud
    l0_points = None

    # Set abstraction layers
    l1_xyz, l1_points = pointnet_sa_module_msg(l0_xyz, l0_points, 512, [0.1,0.2,0.4], [16,32,128], [[32,32,64], [64,64,128], [64,96,128]], is_training, bn_decay, scope='layer1', use_nchw=True)
    l2_xyz, l2_points = pointnet_sa_module_msg(l1_xyz, l1_points, 128, [0.2,0.4,0.8], [32,64,128], [[64,64,128], [128,128,256], [128,128,256]], is_training, bn_decay, scope='layer2')
    l3_xyz, l3_points, _ = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3')

    # Fully connected layers
    net = tf.reshape(l3_points, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.4, is_training=is_training, scope='dp1')
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.4, is_training=is_training, scope='dp2')
    net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')

    return net, end_points

4.2 Point Set Segmentation for Semantic Scene Labeling

  • ScanNet (3D Scene)
    image
    • each point에는 해당 point가 어떤 물체에 속해 있는지에 대한 segmentation label 존재
    • Result : Segmentation 성능 ↑ => 계층적 구조를 통한 local feature 학습이 다양한 scale의 scene 이해에 중요
    • Ablation study : Density adaptive layer 이용하여 non-uniform sampling density로 줄여서 학습 → MRG가 SSG보다 다양한 density 대해 성능 ↑
      image

4.3 Point Set Classification in Non-Euclidean Metric Space

  • SHREC15 (3D non-rigid Object)
    image
    • SHREC15 dataset : 2D surfaces embedded in 3D space
    • Goal : To show generalizability of PointNet++ to non-Euclidean space
    • Requirement : knowledge of 'intrinsic structure'
    • [Fig.7] (a), (c) : different in pose -> same category
    • Geodesic distances along the surfaces induce a metric space

      Geodesic distance : the shortest path between the vertices in a graph

    • PointNet++ : constructing metric space induced by geodesic distance → extracting intrinsic point features in WKS, HKS, multi-scale Gaussian curvature → using these features as input → sampling and grouping points
    • Result : capturing multi-scale intrinsic structure not influenced by specific pose => effectiveness, 성능 ↑

4.4 Feature Visualization

  • Visualization of What has been learned by the 1st level kernels of hierarchical network
    image

Conclusion

Future works

  • To think how to accelerate inference speed of network for MSG and MRG layers by sharing more computation in each local regions
  • To find applications in higher dimensional metric spaces where CNN based method would be computationally unfeasible