[CV_3D] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
1. Introduction
- PointNet : learning a spatial encoding of each point → (max-pooling) aggregating all point features to global PC (local features X)
- PointNet++ : processing a set of points sampled in metric space in a hierarchical fashion
partitioning a set of points into overlapping local regions
→ extracting local features capturing fine geometric structures from small neighborhoods
→ grouping local features into larger unit and processing to produce higher level features
[ Two issues of the design of PointNet++ ]
1. How to generate overlapping partitioning of point set
- Each partition : a neighborhood ball in Euclidean space
- Centroid Location : FPS(Farthest Point Sampling)로 선택
- Scale : combined Multiple scales for both robustness and detail capture (Random input dropout)
2. How to abstract sets of points or local features through a local feature learner (=PointNet)
- PointNet : processing an unordered set of points for semantic feature extraction & robust to input data corruption
- PointNet++ : applying PointNet recursively on a nested partitioning of input set
2. Problem Statement
$X = (M, d)$ : discrete metric space, metric = Euclidean space$R^n$ -
$M$ : set of points (density of$M$ is not uniform) -
$d$ : distance metric
$f$ : set functions = classification or segmentation function-
Input :
$X$ (along with additional features for each point) -
Output : information of semantic interest regarding
$X$ - classification function : to assign a label to
$X$ - segmentation function : to assign a per point label to each member of
Input :
3. Method
3.1 Review of PointNet : A Universal Continuous Set Function Approximator
- Point Cloud : a set of sparse points => efficient But operation for permutation-invariant 필수
PointNet : single MAX pooling → PC의 global feature 추출 But local context 소실 (segmentation performance ↓)
$f$ : permutation-invariant set function → arbitrarily approximate any continuous set function
3.2 Hierarchical Point Set Feature Learning (Set Abstraction)
- PointNet++ : hierarchical grouping of points and progressively abstracting larger local regions
- Set Abstraction level (3 layers) : 전반적인 semantic 정보를 포함한 압축된 PC로 변환 → PC의 local feature 추출
Input :
$N$ x ($d$ +$C$ ) matrix .....$N$ points with$d$ -dim coordinates +$C$ -dim point feature -
Output :
$N'$ x ($d$ +$C'$ ) matrix .....$N'$ subsampled points with$d$ -dim coordinates + new$C'$ -dim feature vectors
In Paper,
$d$ = 3 → (x,y,z)
[ 3 layers ]
❶ Sampling layer
Sampling layer : Selecting a set of points from input points {
${x_1, x_2, ..., x_n}$ }
.....$N$ input points 중$N'$ centroids 선택 (대표성 + local한 공간의 center) -
Farthest Point Sampling (FPS)
- Centroid = the most distant point in metric(euclidean) distance w.r.t the rest points
- Better converge of the entire point set than Random Sampling
❷ Grouping layer
Grouping layer : 각 centroid 대한 neighbor points 찾기 → 묶어서 하나의 local region point set 구성
Input : a point set =
$N$ x ($d$ +$C$ ) & coordinates of a set of centroids =$N'$ x$d$ -
Output : local groups of point sets =
$N'$ x$K$ x ($d$ +$C$ ) .....$K$ : # of neighbor points of centroid points
$K$ : flexible # (group마다 다름) → PointNet layer에서 fixed length local region feature vector 1개씩 추출
Input : a point set =
Metric distances to define neighbor points
KNN : centroid 대해 가장 가까운
$K$ 개의 점들 (fixed number of neighbor points)
KNN : centroid 대해 가장 가까운
- Ball query : centroid 기준 반지름 r 이내의 점들 (fixed region scale) → more generalizable
In Paper, using Ball query method
def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
if knn:
_,idx = knn_point(nsample, xyz, new_xyz)
idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
if points is not None:
grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
if use_xyz:
new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
new_points = grouped_points
new_points = grouped_xyz
return new_xyz, new_points, idx, grouped_xyz
❸ PointNet layer
PointNet layer : Each local region points pattern 파악 (encoding) → local feature vector 1개씩 추출
Input :
$N'$ local regions of points with data size$N'$ x$K$ x ($d$ +$C$ ) -
Output :
$N'$ x ($d$ +$C'$ )
Input :
- Mini-PointNet = basic building block for local pattern learning
def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
data_format = 'NCHW' if use_nchw else 'NHWC'
with tf.variable_scope(scope) as sc:
# Sample and Grouping
if group_all:
nsample = xyz.get_shape()[1].value
new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)
# Point Feature Embedding
if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
for i, num_out_channel in enumerate(mlp):
new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
padding='VALID', stride=[1,1],
bn=bn, is_training=is_training,
scope='conv%d'%(i), bn_decay=bn_decay,
if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])
# Pooling in Local Regions
if pooling=='max':
new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
elif pooling=='avg':
new_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
elif pooling=='weighted_avg':
with tf.variable_scope('weighted_avg'):
dists = tf.norm(grouped_xyz,axis=-1,ord=2,keep_dims=True)
exp_dists = tf.exp(-dists * 5)
weights = exp_dists/tf.reduce_sum(exp_dists,axis=2,keep_dims=True) # (batch_size, npoint, nsample, 1)
new_points *= weights # (batch_size, npoint, nsample, mlp[-1])
new_points = tf.reduce_sum(new_points, axis=2, keep_dims=True)
elif pooling=='max_and_avg':
max_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
avg_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
new_points = tf.concat([avg_points, max_points], axis=-1)
new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])
return new_xyz, new_points, idx
3.3 Robust Feature Learning under Non-Uniform Sampling Density
- Goal : non-uniform density (sparse ~ dense) point set feature learning 어려움 해결
- (1) PC를 다양한 density로 sampling하여 학습
- (2) Density Adaptive layer : 다양한 scale의 PC에서 feature vector 추출하여 결합
[ 2 Types of Density Adaptive layers ]
1. Multi-scale grouping (MSG)
- Grouping을 다양한 scale로 여러 번 적용 → 하나의 centroid 대해 여러 scale의 point sets 생성
- 각 point set에서 추출한 feature vector를 concat하여 multi-scale feature vector 생성
- 각 point set은 random input dropout (down-sampling) → 다양 scale의 density (various sparsity, varying uniformity)
- 단점 : every centroid 대해 local PointNet 돌려야함 → computationally expensive, inefficient, time-consuming
2. Multi-resolution grouping (MRG) ★이해
- MSG의 단점 보완, PointNet++에서 사용한 방법
$L_i$ level features : 2 different scale feature vectors를 concat하여 multi-scale feature vector -
Left vector : lower level
$L_{i-1}$ 의 each sub-region의 features를 summarizing한 feature -
Right vector : local region
$L_i$ 의 all raw points에 대해 PointNet을 거쳐서 얻은 feature - 장점 : large scale neighborhoods at lowest levels에서의 feature extraction 필요 X → more efficient
3.4 Point Feature Propagation for Set Segmentation
- Set Abstraction Sampling layer 의해 PC 크기 감소 → segmentation task 위해 원래 크기 복원
(1) Up-sampling : 이전 점들(
$N_{l-1}$ points )에 대한 feature vector로부터 (1/distance)로 weighted Interpolation - (2) Skip connection : down-sampling 이전의 feature vector를 concat → 정보량 보충
4. Experiments
4.1 Point Set Classification in Euclidean Metric Space
- Input : 2D img coordinates에서 2D PC of digit pixel locations 로 변환 (default 512 points)
- Result : digit classification task에서 PointNet 보다 error rate ↓, CNN-based models 보다도 성능 ↑
- Input : CAD model 3D mesh에서 표면을 sampling하여 3D PC 로 변환 (default 1024 points)
- Additional point features로 face normals 사용 (
$N$ = 5000 ) to boost performance - All points are normalized to be 0 mean and within a unit (r=1) ball
- Model : 3-level hierarchical network + 3 FC layer
- Result : 3D shape classification task에서 MVCNN(SOTA model) 보다 성능 ↑
Ablation study of Density adaptive layer : multi-scale 학습 모델들(MSG, MRG) = robust to points # (or density)
def get_model(point_cloud, is_training, bn_decay=None):
""" Classification PointNet, input is BxNx3, output Bx40 """
batch_size = point_cloud.get_shape()[0].value
num_point = point_cloud.get_shape()[1].value
end_points = {}
l0_xyz = point_cloud
l0_points = None
# Set abstraction layers
l1_xyz, l1_points = pointnet_sa_module_msg(l0_xyz, l0_points, 512, [0.1,0.2,0.4], [16,32,128], [[32,32,64], [64,64,128], [64,96,128]], is_training, bn_decay, scope='layer1', use_nchw=True)
l2_xyz, l2_points = pointnet_sa_module_msg(l1_xyz, l1_points, 128, [0.2,0.4,0.8], [32,64,128], [[64,64,128], [128,128,256], [128,128,256]], is_training, bn_decay, scope='layer2')
l3_xyz, l3_points, _ = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3')
# Fully connected layers
net = tf.reshape(l3_points, [batch_size, -1])
net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
net = tf_util.dropout(net, keep_prob=0.4, is_training=is_training, scope='dp1')
net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
net = tf_util.dropout(net, keep_prob=0.4, is_training=is_training, scope='dp2')
net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')
return net, end_points
4.2 Point Set Segmentation for Semantic Scene Labeling
4.3 Point Set Classification in Non-Euclidean Metric Space
- SHREC15 (3D non-rigid Object)
- SHREC15 dataset : 2D surfaces embedded in 3D space
- Goal : To show generalizability of PointNet++ to non-Euclidean space
- Requirement : knowledge of 'intrinsic structure'
- [Fig.7] (a), (c) : different in pose -> same category
- Geodesic distances along the surfaces induce a metric space
Geodesic distance : the shortest path between the vertices in a graph
- PointNet++ : constructing metric space induced by geodesic distance → extracting intrinsic point features in WKS, HKS, multi-scale Gaussian curvature → using these features as input → sampling and grouping points
- Result : capturing multi-scale intrinsic structure not influenced by specific pose => effectiveness, 성능 ↑
4.4 Feature Visualization
Future works
- To think how to accelerate inference speed of network for MSG and MRG layers by sharing more computation in each local regions
- To find applications in higher dimensional metric spaces where CNN based method would be computationally unfeasible