机器学习与计算机视觉

pointnet论文翻译(二)

2018-12-19  本文已影响0人  Joey_H

3. Problem Statement

We design a deep learning framework that directly consumes unordered point sets as inputs. A point cloud is represented as a set of 3D points {Pi | i = 1, ..., n}, where each point Pi is a vector of its (x, y, z) coordinate plus extra feature channels such as color, normal etc. For simplicity and clarity, unless otherwise noted, we only use the (x, y, z) coordinate as our point’s channels. For the object classification task, the input point cloud is either directly sampled from a shape or pre-segmented from a scene point cloud. Our proposed deep network outputs k scores for all the k candidate classes. For semantic segmentation, the input can be a single object for part region segmentation, or a sub-volume from a 3D scene for object region segmentation. Our model will output n × m scores for each of the n points and each of the m semantic subcategories.

3.问题陈述

我们设计了一个深度学习框架,直接使用无序点集作为输入。点云表示为一组三维点{Pi | i = 1, ..., n},其中每个点Pi是它的(x,y,z)坐标的向量加上额外的特征通道,如颜色、法线等特征。为了简单明了,除非另有说明,我们只使用(x,y,z)坐标作为点的通道。的频道。对于对象分类任务,输入点云要么直接从形状中采样,要么从场景点云中预分割。我们建议的深度网络输出k个分数对应所有的k个候选类别。对于语义分割,输入可以是用于局部区域分割的单个对象,也可以是用于对象区域分割的3D场景中的子卷。我们的模型输出n点和m个语义子范畴的n×m分数。

4. Deep Learning on Point Sets

The architecture of our network (Sec 4.2) is inspired by the properties of point sets in R n (Sec 4.1).

4.1. Properties of Point Sets in Rn 

Our input is a subset of points from an Euclidean space.It has three main properties:

• Unordered. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order. In other words, a network that consumes N 3D point sets needs to be invariant to N! permutations of the input set in data feeding order.

• Interaction among points. The points are from a space with a distance metric. It means that points are not isolated, and neighboring points form a meaningful subset. Therefore, the model needs to be able to capture local structures from nearby points, and the combinatorial interactions among local structures.

• Invariance under transformations. As a geometric object, the learned representation of the point set should be invariant to certain transformations. For example, rotating and translating points all together should not modify the global point cloud category nor the segmentation of the points.

4.点集的深度学习

我们的网络体系结构(第4.2节)是受Rn(第4.1节)中点集的性质所启发的。

4.1.Rn中点集的性质

我们的输入是欧氏空间中点的子集。它有三个主要特性:

·无序性。与图像中的像素阵列或体积网格中的体素阵列不同,点云是一组没有特定顺序的点。换句话说,消耗N个3D点集的网络需要对数据输入顺序中输入集的N!个排列不变 (a network that consumes N 3D point sets needs to be invariant to N! permutations of the input set in data feeding order. )。

·各点之间的相互作用。这些点来自具有距离度量的空间。这意味着点不是孤立的,相邻的点构成有意义的子集。因此,模型需要能够从附近点捕捉局部结构,以及局部结构之间的组合相互作用。

·转型下的不变性。作为一个几何对象,点集的学习表示应该不受某些变换的影响。例如,旋转和转换点不应修改全局点云类别,也不应对点进行分割。

4.2. PointNet Architecture

Our full network architecture is visualized in Fig 2, where the classification network and the segmentation network share a great portion of structures. Please read the caption of Fig 2 for the pipeline. Our network has three key modules: the max pooling layer as a symmetric function to aggregate information from all the points, a local and global information combination structure, and two joint alignment networks that align both input points and point features. We will discuss our reason behind these design choices in separate paragraphs below.

Symmetry Function for Unordered Input

In order to make a model invariant to input permutation, three strategies exist: 1) sort input into a canonical order; 2) treat the input as a sequence to train an RNN, but augment the training data by all kinds of permutations; 3) use a simple symmetric function to aggregate the information from each point. Here, a symmetric function takes n vectors as input and outputs a new vector that is invariant to the input order. For example, + and ∗ operators are symmetric binary functions.

While sorting sounds like a simple solution, in high dimensional space there in fact does not exist an ordering that is stable w.r.t. point perturbations in the general sense. This can be easily shown by contradiction. If such an ordering strategy exists, it defines a bijection map between a high-dimensional space and a 1d real line. It is not hard to see, to require an ordering to be stable w.r.t point perturbations is equivalent to requiring that this map preserves spatial proximity as the dimension reduces, a task that cannot be achieved in the general case. Therefore, sorting does not fully resolve the ordering issue, and it’s hard for a network to learn a consistent mapping from input to output as the ordering issue persists. As shown in experiments (Fig 5), we find that applying a MLP directly on the sorted point set performs poorly, though slightly better than directly processing an unsorted input.

4.2.PointNet结构

我们的整个网络架构如图2所示,其中分类网络和分割网络共享很大一部分结构。请阅读图2的标题。我们的网络有三个关键模块:最大池化层作为一个对称函数来聚合来自所有点的信息,一个本地和全局信息组合结构,以及两个联合对齐网络,使输入点和点的特征对齐。我们将在下面的单独段落中讨论这些设计选择背后的原因。

无序输入的对称函数

为了使模型对输入排列不变量,有三种策略:1)将输入按规范顺序排序;2)将输入看作训练RNN的序列,但是通过各种排列来增加训练数据。3)使用一个简单的对称函数从每个点聚集信息。这里,对称函数以n个向量作为输入,并输出一个与输入顺序不变的新向量。例如,+和∗运算符是对称二进制函数。

虽然排序听起来像是一个简单的解决方案,但实际上在高维空间中并不存在稳定的w.r.t排序。这很容易由产生矛盾而证明。如果存在这样的排序策略,则在高维空间和一维实线之间定义一个双射映射。不难看出,要求顺序是稳定的W.R。T点扰动相当于要求这张地图在维数减小时保持空间邻近性,这在一般情况下是无法完成的。因此,排序(sorting)并未圆满解决排序问题(ordering issue),当排序问题持续存在时,网络很难学习从输入到输出的一致映射。如实验所示(图5),我们发现在排序点集上直接应用MLP性能很差,虽然略好于直接处理未排序的输入。

The idea to use RNN considers the point set as a sequential signal and hopes that by training the RNN with randomly permuted sequences, the RNN will become invariant to input order. However in “OrderMatters” [25] the authors have shown that order does matter and cannot be totally omitted. While RNN has relatively good robustness to input ordering for sequences with small length (dozens), it’s hard to scale to thousands of input elements, which is the common size for point sets. Empirically, we have also shown that model based on RNN does not perform as well as our proposed method (Fig 5). Our idea is to approximate a general function defined on a point set by applying a symmetric function on transformed elements in the set:

使用RNN的思想是将点集看作序列信号,并希望通过训练具有随机置换序列的RNN,使RNN对输入顺序保持不变。然而,按“OrderMatters”[25]作者已表明,秩序确实重要,不能完全忽略。虽然RNN对小长度(几十个)序列的输入排序具有较好的鲁棒性,但它具有较好的鲁棒性。但它很难放大到数千个输入元素,而这是点集的通用大小。在经验上,我们还证明了基于RNN的模型的性能不如我们提出的方法(图5)。我们的思想是,通过对集合中转换的元素应用对称函数来逼近点集上定义的一般函数:

Empirically, our basic module is very simple: we approximate h by a multi-layer perceptron network and g by a composition of a single variable function and a max pooling function. This is found to work well by experiments. Through a collection of h, we can learn a number of f’s to capture different properties of the set. While our key module seems simple, it has interesting properties (see Sec 5.3) and can achieve strong performace (see Sec 5.1) in a few different applications. Due to the simplicity of our module, we are also able to provide theoretical analysis as in Sec 4.3.

经验上,我们的基本模块非常简单:我们用多层感知器网络来逼近h,用单变量函数和最大池化函数的组合来逼近g。通过实验发现,这种方法效果很好。通过h的集合,我们可以学习一些f来捕捉集合的不同属性。虽然我们的关键模块看起来很简单,但它有一些有趣的属性(参见5.3节)并能在几个不同的应用程序中实现较强的性能(参见5.1节)。由于我们模块的简单性,我们还可以提供理论分析,如第4.3节所示。

上一篇下一篇

猜你喜欢

热点阅读