Real 3D / Volumetric CNN for me

2017-01-09 本文已影响630人 MrGiovanni

Author: Zongwei Zhou | 周纵苇
Weibo: @MrGiovanni
Email: zongweiz@asu.edu
原文链接: http://zongwei.leanote.com/post/3D

Reviews

[1] Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks. paper

Application: Cerebral microbleeds (CMB) detection.
Dataset: SWI-CMB
Preprocessing: normalized the volume intensities to the range of [0,1].
Evaluation: sensitivity (S), precision (P) and the average number of false positives per subject ($FP_{avg}$).
System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
Method

1. Screening strategy > conventional sliding window strategy.相当于一个3D的fully convolutional networks，把3D的数据输入，输出一个3D的score map。这样来初步找到可能目标的坐标点集(Region of Interest, ROI)，其中会包含很多false positive，不过这也比扫描高效很多。问题是从TABLE 1看，这个网络结构并不像Fully Convolutional Networks啊，更像是个普通的分类网络。不知道作者是如何得到score map的。

THE ARCHITECTURE OF 3D FCN SCREENING MODEL

2. Discrimination stage removes large number of false positive candidates. 相当于一个3D的CNN，用来检测3D patch。ReLU is utilized in the C and FC layer.
3D CNN architecture details: The 3D convolution kernels are randomly initialized form the Gaussian distribution (Learning from Scratch), opimizer is SGD, loss funciton is cross entropy loss. Meanwhile, dropout strategy is utilized. lr=0.03, momentum=0.9, dropout rate=0.3, batch size=100.

512 $\times$ 512 $\times$ 150 image $\longrightarrow$ 3D FCN $\longrightarrow$ 512 $\times$ 512 $\times$ 150 score map $\longrightarrow$ threshold ($\mathcal{T}$ = 0.64) $\longrightarrow$ 20 $\times$ 20 $\times$ 16 patch $\longrightarrow$ 3D CNN $\longrightarrow$ labeled.

Results and Conclusions:

1. 3D FCN better than these two methods - Barnes et al. and Chen et al.

COMPARISION OF DIFFERENT SCREENING METHODS

2. Good detection performance

EVALUATION OF DETECTION RESULTS

FROC COMPARISON
对比到对象是Bames et al，random forest和2D-CNN-SVM。

3. Capability of intermediate FEATURE representation better.

FEATURE REPRESENTATION
这个对比还是很新奇的，使用的工具是t-SNE toolbox.

[2] Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection. paper

Application: reduce false positive for pulmonary nodule detection in volumetric CT scans.
Dataset: LUNA16 challenge held in conjunction with ISBI 2016. Totally extracted 0.65 million samples to train the 3D CNNs in order to meet the larger parameter scales in 3D CNNs.
Preprocessing: 1) Data augmentation - translated by 1 voxel along each axis and rotated 90, 180 and 270 degrees with the transverse plane. In total, 0.65 million samples generated for training. 2) Normalization - clipped the intensities into the interval (-1000,400) HU and normalized them to the range of (0,1).
3D CNN architecture details: Learning from Scratch, lr=0.3 and decayed by 5% every 5000 iterations. batchsize=200, momentum=0.9, and the dropout rate=0.2 stragety is utilized in C and FC layers.
Evaluation: FROC, Sensitivity
System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
Method

1. Multi-level contextual receptive field.

FUSION OF THREE 3D CNNs
实质上是融合了三个不同的3D CNN的预测结果，这三个网络是根据不同尺寸的input patch来训练得到的，也就是说“多尺度”的CNN。。。好吧，理论上的优点是既用到了局部的细节特征，又用到了全局的特征。这个方法我们曾经有想过，也有很多研究者在2D上做过这个。对于多尺度问题，需要定义“尺度”的大小，所以作者就对数据集做了统计分析，如下图

DISTRIBUTION ANALYSIS OF THE SIZES OF PULMONARY NODULES FOR DETERMINING RECEPTIVE FIELDS.
这个多尺度的划分方法感觉是比较原始的，在实际应用中可参考性不佳，因为需要对数据集做一个统计，而选取的样本是否有统计代表性，要是来了新的数据是否还适用，都是不确定的。作者用的是voxels来标定的，首先来说我认为可以改成绝对的尺度（mm）。

2. Multi-model fusion
接下来看三个3D网络的融合过程，三个网络结构如表

THE ARCHITECTURE OF DIFFERENT RECEPTIVE FIELD 3D CNN
Fuse the softwax regression outputs (probabilities) from all networks. The fused posterior probability $P_{fusion}$ is estimated by weighted linear combination:
$$P_{fusion}=\sum_{i\in{1,2,3}}\gamma_i\cdot P_i$$
The constant weight $\gamma_i$ were determined using grid search on a small subset of the training data in our experiments ($\gamma_1=0.3$, $\gamma_2=0.4$, $\gamma_3=0.3$).
这个融合其实并没有在网络内部进行融合，只是对于输出的概率做了一个简单的融合，这个是表面上的“融合”。对于融合，还有更多的方法，如拼接三个CNN的全连接层来融合，一个思想是把back propagation机制放在融合的过程中，这才是我比较认同的融合。

Evaluation Metrics

我觉得这部分是比较有参考价值:
The challenge evaluated detection results by measuring the detection sensitivity and average false positive rate per scan. A predicted candidate location was counted as a true positive if it was located within the radius of a true nodule center.（对于True Positive的定义对于画FROC是很关键的） Detections of irrelevant findings were ignored (i.e., considered as neither false positives nor true positives) in the evaluation. The challenge organizers performed the free receiver operation characteristic (FROC) analysis by setting different thresholds on the raw prediction probabilities submitted by the participating teams. The evaluation also computed the 95% confidence interval using the bootstrapping [36]. A competition performance metric (CPM) score [37], which was calculated as the average sensitivity at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4 and 8 false positives per scan, was produced for each algorithm. The ten-fold cross validation on the dataset was specified.

Results and Conclusions:

1. 3D > 2D

3D vs 2D CNN detection

2. Fusion multi-level > single level

FROC ANALYSIS FOR DIFFERENT LEVEL

在论文的最后作者给出了3D的卷积核的可视化图，我不清楚放这个有什么用，能说明什么结果？

[3] 3D Deeply Supervised Network for Automatic Liver Segmentation from CT Volumes. paper

这篇文章给我的感觉就是一个3D的HED (paper)，或者说一个3D Fully Convolutional Networks (paper)，来对比一下它们的网络结构：

3D DSN

HED

FCN
都是结合中间层的输出map，来做最后的分割预测，这个结构当时给我的疑问是如何设计back propagation，还有怎么把各个中间层结合起来，加权的权重是怎么学习出来的，是否也要放到back propagation中去？

Application: Liver (肝脏) Segmentation.
Dataset: MICCAI-SLiver07 dataset. The dataset totally consists of 30 contrast-enhanced CT scans (20 training and 10 testing).
3D DSN architecture details: The mainstream network consists 11 layers: 6 convolutional layers, 2 max-pooling lyers, 2 deconvolutional layers and 1 softmax layer.（这里的一个问题是：我发现作者每篇论文中的网络kernel，stride，pooing大小都不太一样，这个是凭感觉决定的吗～正常比较靠谱的convolutional大小应该是像VGG那样的3$\times$3$\times$3）。Learning from Scratch, lr=0.1 and divided by 10 every fifty epochs. The deep supervision balancing weights ($\eta_h$?) were initialized as 0.3 and 0.4, and decayed by 5% every ten epochs.
Evaluation: Volumetric overlap error (VOE[%]), relative volume difference (VD[%]), average symmetric surface distance (AvgD[mm]), root mean square symmetric surface distance (RMSD[mm]) and maximum symmetric surface distance (MaxD[mm]). Details of these metrics can be found in Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets
System Implementation: Framework based on Theano library, using a GPU of NVIDIA GeForce GTX TITAN Z.
Method

1. vanishing gradients problem
文中提到来梯度消失的问题，在3D的网络中可能会更加严重。解决方案是用多个中间层的预测输出来设计Loss，
$$\mathcal{L}=\mathcal{L}{o}(\mathcal{X};W)+\sum{\eta_h\cdot\mathcal{L}{h}(\mathcal{X};W_h,w_h)}+[regularization]$$
用权重$\eta_h$来控制各个隐层的重要性，从而解决前面几层的梯度消失，这个我个人认为不是很站的住脚，原因是一旦出现梯度消失，这个梯度是很小的，大概就是可以认为是0，那么要乘一个很大很大的权重才可以把数值拉上来，即使这样，其实并没有根本解决梯度消失。另外，ReLU的提出好像就是为了解决这个问题的，我不确定如果在3D中用这个激活函数还需不需要考虑梯度消失问题。

2. 条件随机场（CRF）模型
这个就很拼学术功底了，也是我为什么感觉自己的本科学历不够用的重要原因，正常情况下，我是不可能会想到要用这个模型来优化结果的。文章中的篇幅很小，需要拓展学习。我所知道的是作者引入了很多参数（$\mu_1$,$\mu_2$,$\theta_{\alpha}$,$\theta_{\beta}$,$\theta_{\gamma}$），来解一个entropy funciton，用到的方法依然是grid search。

Results and Conclusions:

1. 3D DSN > 3D CNN | CRF works good

EVALUATION

VISUALIZATION

2. Shorter runtime - 5s for 3D DSN and 87s for CRF.

COMPARISON WITH OTHER TEAM
可以看出，3D到网络运行到时间很短，而条件随机场处理很费时间。

[4] 3D Fully Convolutional Networks for Intervertebral Disc Localization and Segmentation. paper

这篇文章在算法上就只是把2D的FCN变成了3D的FCN，其他没有什么改进的地方，应用到了一个椎间盘的分割数据集中。

Application: Intervertebral discs (IVDs) (椎间盘) for volumetric data.
Dataset: MICCAI 2015 Challenge on Automatic Intervertebral Disc Localization and Segmentation.
Preprocessing: subtracting the mean value before inputting into the network.
System Implementation: 3D FCN using the framework based on Theano library, using a GPU of NVIDIA GeForce GTX X. 2D FCN was implemented with Matlab and C++.
Comparison: 2D FCN - the input is the adjacent slices (3 slices input and the output is the binary mask of the middle slice).
Evaluation: For IVD localization - mean localization distance (MLD) with standard deviation (SD), successful detection rate $P$. For IVD segmentation - mean dice overlap coefficients (MD) with SD, mean average absolute distance (MAAD) with SD.
Results and Conclusions:

1. 3D FCN > 2D FCN

TEST1

TEST2

总体来看，这篇论文的论点很简单，方法有创新(2D$\longrightarrow$3D)，但是比较常规，结论也很简单，但是从我的角度看很有学习的必要，因为在这种情况下要发表，很考验写作的能力了，举例来说，写实验结果的时候，如果让我写，那就是一句话：3D FCN performs better than 2D FCN both in IVD localization and segmentation. 完事儿了。:-)

[5] VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation. H Chen, Q Dou, L Yu, P Heng [CUHK] (2016). paper.

Propose a deep voxelwise residual network, referred as VoxResNet (3D Residual Network).
An auto-context version of VoxResNet is proposed

The architecture of VoxResNet

auto-context

Comparison of VoxResNet, Auto-context VoxResNet and Ground truth

[6] Evaluation and comparison of 3D intervertebral disc localization and segmentation methods for 3D T2 MR data: A grand challenge. paper

这篇期刊是对椎间盘检测和分割[Review.4]的一个比较详细的介绍，也让我直观的感觉到了会议论文和期刊论文的区别，期刊就像对会议论文的每一个点都展开来描述的一样。随着CVPR，IPMI，MICCAI投完，我们也要开始投期刊了，把几个会议的内容充实起来，变成一篇丰满的期刊～没有时间仔细看了！Review到此为止。

Related works

[1] V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi [Johns Hopkins University]. paper.

Propose an approach to 3D image segmentation based on a volumetric, fully convolutional neural network (3D-FCN).
Introduce a novel objective function, optimise using Dice coefficient. In this way we can deal with situa- tions where there is a strong imbalance between the number of foreground and background voxels.

The architecture of V-Net

Implementation available at https://github.com/faustomilletari/VNet
Implementation available at https://github.com/faustomilletari/3D-Caffe

[2] 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox [University of Freiburg, Google Deepmind]. paper, code (Caffe).

NVIDIA TitanX GPU

2D U-Net Architecture

3D U-Net Architecture

[3] Deep MRI brain extraction: A 3D convolutional neural network for skull stripping. Jens Kleesiek, Gregor Urban, Alexander Hubert [Heidelberg University Hospital]. paper.

CNN architecture details

[4] Integrating Online and Offline 3D Deep Learning for Automated Polyp Detection in Colonoscopy Videos. Lequan Yu, Hao Chen, Qi Dou [CUHK] (2016). paper

Offline 3D FCN 1

Offline 3D FCN 2

Offline 3D FCN 3

Comparison

The authors compared three different CNN architectures. 说实话这个的参考价值很低，因为很大程序上取决于经验和试凑。

Discussions online

1. Are there any deep learning libraries that have 3D volumetric/spatial convolutions running on a CPU or a GPU?

A recent addition, but Keras now supports 3D convolution. It should work for voxels and video sequences.

2. 3D CNN in Keras - Action Recognition

3. Software: https://github.com/facebook/C3D

Separable 3D CNN

1. References papers

[1] Learning Separable Filters. Amos Sironi, Bugra Tekin, Roberto Rigamonti [EPFL] 2014. paper -- check Section 5.5.

2. Try on

Examine the separability of the kernels in the pre-trained CNNs, check http://www.mathworks.com/matlabcentral/fileexchange/28238-kernel-decomposition

Some Questions

在论文的最后作者给出了3D的卷积核的可视化图，我不清楚放这个有什么用，能说明什么结果？
我发现作者每篇论文中的网络kernel，stride，pooing大小都不太一样，这个是凭感觉决定的吗？
[3] paper 的多层融合以及各个层多权重$\eta_h$的训练，编程是怎么实现的？
[4] paper 对于3D FCN代码是否有开源，文章中的结论是3D>2D，是否对于3D的FCN有其他细节的改进，因为根据我自己的实验结果，精确度差不多啊。
作者的团队现在在用什么框架，是自己编程还是用开源的代码，如今3D的代码Lasagne的开源程度如何？

祝好！