2017.10-文献阅读随笔

2017-10-08 本文已影响0人 EdwardMa

Object-based Visual Saliency via Laplacian Regularized Kernel Regression（IEEETMultiMedia）

显著性检测源自于生物感知的机制，生物学的证据表明基于目标的显著性源自空间注意力的扩散（spread of the spatial attention）,受此启发，本文尝试利用新出现的目标注意力传播机制来构建一个新的计算模型。综合考虑了location, color, closure以及fixation map的信息，来从噪声样本{x,y}中估计回归函数f，y=f(x)+n, 建模为一个有显式解的最小二乘问题。

LKR-Model

Principal Graph and Structure Learning Based on Reversed Graph Embedding （PAMI）

分析高维数据的过程中，通常需要保留数据最重要的结构。Principal curve是广泛使用的一种方法。然而很多现有的方法仅仅对由curves数学定制的数据有效，在实际应用中颇受限制。为了处理这个问题，本文提出一个新颖的principal graph和结构学习的框架，通过reverse graph embedding来捕捉潜在图结构的局部信息。即通过学习来同时得到数据中主要的points和图的结构。

Discriminative clustering on manifold for adaptive transductive classification （Zhao Zhang, NN2017）

通过joint流形上的判别聚类算法，本文提出一个自适应的推断式标签传播方法，可用于高维数据的分类和表示。这个框架seamlessly融合无监督的流形学习、判别聚类和自适应的分类为一个统一的模型。此外，该方法incorporates自适应的图权重构建with标签传播，特别地，该方法可以使用低维流形特征自适应的权重传播标签信息，不同于大部分已经存在的研究（在欧式空间里计算权重）。For 推断式分类，首先perform联合判别的K-means聚类和流形学习来capture低维的非线性流形。然后在学到的流形特征上构建自适应的权重（通过加入特征学习和判别聚类来计算），然后在自适应的图上进行LP。构建的目标函数有五个变量（聚类中心，流形特征Z和H，自适应权重，标签矩阵），构建了交替乘子法。本文证明了算法的收敛性（每次迭代目标函数单调下降），介绍了本文模型与其他方法的联系，在分类（Yale人脸数据和UCI数据）和分割数据集（Berkeley数据集上拿了7张图）上做了实验。

写作学习

介绍背景，除了play an important role，还可以用：in the fields of NN, PR and DM, how to represent and classify high-dimensional data still remains an important issue.

不同的方式表达半监督学习的思想：

SSL that can learn knowledge using both labeled and unlabeled data for representation and classifiction has been attracting increasing attention in the recent years. s a typical SSL task, the number of labeled data points is usually limited and the quantity of unlabeled data points is usually adequate. Moreover, the labeling process of data is also costly.

More specifically, LP can be described as a process of propagating supervised prior information of labeled data to the unlabeled data based on their pairwise relationships and initial states, i.e., trading-off the manifold smoothing term and the label fitness term.

All existing LP methods aim at estimating the unknown labels of unlabeled points by receiving information partially from the initial label and partially from its neighborhoods, where the latter part over neighborhoods is decided by a weighted neighborhood graph.

某方法最近较热：In recent decades, plenty of GSSL methods have been proposed to address the representation and classification issues based on the cluster and manifold assumptions. More recently, *** has been arousing considerable attention because of its effectiveness and faster computation speed.

引出现有方法存在的问题：It is also worth noting that virtually all the existing transductive LP formulations may suffer from three potential drawbacks.

i. they perform LP after an 独立的权重构建过程，but note that such operation cannot ensure 构建的图对随后的标签传播和估计是最优的。

ii. in existing studies, 每个数据邻域的信息通常determined by 用KNN for编码流形的光滑性。But the 固定的number of K通常应用于每个样本，not adaptive for 不同的数据或者不同的分布. Also, 选择合适的邻域大小K也不是容易的in reality due to complex contents and distributions of various real data. Hence, determining the neighborhoods of samples adaptively needs to be explored.

iii. existing LP models 定义权重和传播标签信息based on the 原始的高维数据 that usually contains the unfavorable features, irrelevant features, noise and even gross corruptions, but the included noise, corruptions and the unfavorable features may result in inaccurate similarity measure and prediction results directly. Note that 高维数据投影到子空间能发现underlying subspace structures of high-dimensional data. It is note that 流形学习能保留重要的局部或者全局的几何结构 among samples and remove unfavorable features as well as noise effectively.

解决上述问题，介绍本文贡献：To address the aforementioned shortcomings, in this paper we mainly propose a new unified framework that seamlessly combines the manifold learning, discriminative clustering and adaptive transductive data classification. In summary, the major contributions are summarized as follows:

i. Technically, a novel LP model via *** is proposed for representing and classifying high-dimensional real data. More detailedly, 本文方法整合自适应的图构建with the process of label propagation. Specifically, 本文方法计算权重和预测样本点的标签 based on the low-dimensional manifold features,* 不同于大多数existing works that performs 直接基于原始的输入空间。基于新奇的统一框架，我们的方法能obtain a 低维流形特征矩阵，an 自适应的图权重矩阵，a set of 聚类中心点和a soft label matrix.

ii. To compute the 低维流形特征，我们join the 特征学习和判别聚类 so that the classification result 会更加精确， because 流形特征are calculated through最小化重建误差over the 聚类和特征学习 at the same time. That is, propossing a way to *** and enable ***. The brought benefits are twofold.

iii. For transductive semi-supervised classification by the adaptive LP, we have ***.

结论：We mainly propose an *** framework by *** for transductive classification. Technically, *. To obtain more accurate and optimal graph weights for enhancing the joint powers of representation and classification, we perform *. We also explicitly incorporate the adaptive graph weigh ***. Besides, the tricky process of selecting optimal neighborhood size or kernel width is avoided.
We have provided extensive simulations to demonstrate the effectiveness of our algorithm. Although promising results are delivered by comparing with other related methods, in future we will explore the performance of our out-of-sample version for the other related application areas, e.g., image retrieval. In addition,
the optimal determination of model parameters still remains to be investigated in future work.

A review of level-set methods and some recent applications （大牛Osher 发在JCP 2017的大作）

写作学习：
（介绍背景）Representing and tracking the evolution of interfaces is a fundamental component of computer simulations. An efficient way to do so is to use Osher 和 Sethian的水平集方法。（简单介绍方法的思想）It consists in 用高维函数的水平集表示moving front。（开始对比显示和隐式方法的优劣）

The main advantage of this implicit representation of a moving front is 能自然的handle 拓扑的变化。 This is contrast to explicit methods for which 拓扑的变化需要额外的工作。We note, however, that explicit methods have the advantage of accuracy (e.g. *** better than *** for the same grid resolution).
Volume of fluid methods also adopt an implicit formulation using *. These methods have the advantage of .... They are however more complicated than level-set methods in three spatial dimensions and it is difficult to compute accurately ... such as ... *.
Also, we note that phase-field models have been extensively used to ...问题, particularly in the case of ... However, these models do not represent ...，which in turn leads to a degradation of the accuracy where it matters most and impose （与lead to对应） sometimes stringent time step restrictions.
In what follows, we review the level-set method, including ...
The Fast Sweeping Method, which is often associated with the level-set method for its ability to compute the signed distance function and solutions to other Hamilton-Jacobi equations, is also briefly discussed.

Level set methods 记为A， explicit methods 记为B。对比的逻辑结果为：A优点，is contrast to B缺点，However, B优点. C 方法 also 用implicit的形式，优点；However与A方法比more complicated. Also, we note that D模型广泛应用，特别是某例子。However，D模型缺点。

结论（一般现在时或者现在完成时）：Level-set and fast sweeping methods are powerful numerical methods that have been applied to a wide range of applications.

Transductive Zero-Shot Learning With Adaptive Structural Embedding （TNNLS 2017，冀中老师）

摘要：Zero-shot learning, 让计算机系统有能力识别之前从未见过的类别。两个基本的挑战：
1）交叉模态学习中visual-semantic embedding and domain adaptation
2）unseen class prediction steps
本文提出两个对应的方法
1）ASTE：自适应的结构嵌入，自适应的调整松弛变量来体现训练样本中不同的置信度
2）SPASS：Self-PAced 选择策略，迭代的从unseen 样本中选择信赖→less信赖的样本
把这两个方法结合起来progressively reinforce the classification capacity. 进一步提出一个快速训练方法（用每一个seen 类平均visual features表示该类，the FT strategy simply takes the visual pattern of each class as training data, which greatly alleviates the computational burdens, especially for those gradient descent-based approaches）来提升efficiency。

Recently, image classification has made tremendous improvements due to the prosperous progress of deep learning and the availability of large-scale annotated databases [1],[2]. However, in many applications, it is impractical to obtain adequate labeled object categories [3]-[5]. To tackle this limitation, zero-shot learning is proposed to recognize the unseen categories that no labeled data are available for training. *** has received increasing attention in recent years.
a semantic space is built to associate the semantic relationships between the seen and unseen categories.

传统方法的思想和弊端

the max margin-based methods employ a ranking function to measure the compatibility scores between the images and the class semantic vectors in which a compatibility matrix is derived by enforcing the correct label to be ranked higher than any of the other labels. However, in such models, the seen instances are typically treated without counting for their different reliablenesses during training in which the structural information of the seen data may be undermined.

解决方法

To address this problem, we formulate ... to distinguish the training data, where the reliable instances are imposed with small punishments while the less reliable instances are imposed with more severe punishments. In this way, the structural information in the seen data is effectively exploited by assessing their reliability and discriminability.

However, such approaches mainly focus on exploiting the structural information in the unseen data, and the potential label information is disregarded or underestimated. Actually, although the unseen data are unlabeled, we can predict their potential labels with the knowledge learned from the seen data. To this end, we propose to exploit the potential unseen label information in an easy to hard fashion, which includes two steps: 1) learning the visual-semantic embedding with the labeled seen data and 2) gradually refining the visual-semantic embedding with the seen data and a set of selected unseen data in an iterative way. At each iteration, all unseen data are first predicted with the current visual-semantic embedding, and the reliable unseen instances are selected as pseudolabeled data with a self-paced selective strategy (SPASS), and then the pseudo labeled data are added into the labeled data set to refine the visual-semantic embedding. In this way, the knowledge is adapted progressively from the seen domain to the unseen domain. Meanwhile, the potential labeled information of unseen data is exploited in a confident way; thus, the domain shift problem is readily addressed. We illustrate the proposed transductive framework in Fig. 1. First, adaptive structural embedding (ASTE) learns an initial visual-semantic embedding with the seen data to predict the potential labels of the unseen instances. Then, the reliable unseen instances are selected to merge into the training data with the SPASS method. Finally, the classification capacity is reinforced by refining the visual-semantic embedding with the new training data.

结论

In this paper, we mainly addressed the visual-semantic embedding and domain adaptation in ZSL. (点明处理的具体问题) For the first one, we proposed （这里用了过去时，那么本段要全用过去时，保持时态一致） an ASTE method by formulating the visualsemantic embedding in an adaptively latent structural SVM framework where the reliability and discriminability of the training instances are exploited. For the second one, we presented an SPASS to iteratively select the unseen instances from reliable to less reliable to gradually transfer the knowledge from the seen domain to the unseen domain. Then, we combined ASTE and SPASS to develop a transductive ZSL approach named TASTE to progressively reinforce the discriminant capacity. Extensive experiments on three benchmark data sets have verified the superiorities of these proposed methods. Specifically,（具体说明每个方法在某个数据集上达到最好） ASTE, as an inductive approach, achieved the best performance on AwA data set, and TASTE, as a transductive approach, performed the best on AwA and aPY data sets. Besides, we also presented a simple but effective FT strategy to speed up the training speed of ZSL by employing the visual pattern of each class as input training data. The speed up ratios are about 4 for the methods with closed-form solutions and 100 for those gradient descent-based approaches.

Image-Specific Classification With Local and Global Discriminations （TNNLS 2017，Chunjie Zhang）

摘要：大部分图像分类方法尝试用单独用训练图像给每一类学习一个分类器。由于类间和类内的变化，学习的时候考虑测试图像可能会更加有效。本文针对每个测试图像训练一个special 分类器。选取KNN个训练local分类器，所有的训练集训练global分类器。

On Selecting Effective Patterns for Fast Support Vector Regression Training

思想：在大数据集上训练SVM回归非常耗时，目前SVMR存在的问题中包括1）如何选择合适的核函数？ 2）如何调参；3）如何整合先验的知识到学习过程中；4）训练集非常大时如何加速程序；关于加速SVMs可以分为三类：1）使用快速算法解决二次规划问题；2）建立新的模型避免二次规划；3）选择训练集的子集合来训练。本文关注如何选择合适的子训练集来训练SVR。
核心思想有两方面

正式训练前找到可能的支撑向量，通常支撑向量位于两类的边界上。Specifically, 建立描述不同分布差异的度量D(x)，D(x)越大，说明x越可能在不同类数据分布的边界上；D(x)越小，说明x越可能在同一类数据分布的内部。
利用训练集标签的信息，建立局部的KNN。计算D(x)的时候，需要首先find KNNs。如果在整个数据集上找，不仅计算量大，而且没有充分利用到训练集的信息。计算x的KNN时，利用x对应的标签y的信息，在(y-a, y+a)中选择样本建立KNN。（离散标签思想类似，也是选取某个范围的标签值，相对应的patterns数量K'要大于K）

Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge 【MIA 2017.12】

LUNA 16 肺结节检测challenge，提供大量公开的数据比较不同的肺结节自动检测算法。最好的方法使用了卷积网络，组合不同的算法得到了最高的精度。
写作学习

Lung cancer is the deadliest cancer worldwide, accounting for approximately 27% of cancer-related deaths in the United States.

One of the major challenges arising from the implementation of these screening programs is the enormous amount of CT images that must be analyzed by radiologists.

Designing image segmentation studies: Statistical power, sample size and reference standard quality 【MIA 2017.12】

评价分割算法通常与一个参考标准做比较。这依赖于study reference standard的大小和质量。这个工作利用统计计算，让研究者可以估计合适的样本来检测两个算法分割结果在精度方面有意义的不同。进一步，本文derive一个formula来建立参考标准误差和他们对研究样本大小的影响，使用低质量（但是可能会more affordable and practially available）的参考标准。

Learning and combining image neighborhoods using random forests for neonatal brain disease classification 【MIA 2017.12】

characterize and classify 早起婴儿正常or不正常的脑部发育非常具有挑战性。为了减少heterogeneous data population的复杂性，流形学习被广泛的应用，来寻找数据低维的表示，同时保留所有relevant information。用于构建流形表示的neighborhood definition对于保留相似性结构和高度应用的独立性非常关键。最近提出的neighborhood approximation forests 基于用户定义的距离在一个数据集中学习邻域结构。
我们提出一个框架在脑图像的population中学习多个pairwise的距离，并以无监督的方式在流形学习步骤combine them。不像其他的方法，仅仅使用单一的距离度量，我们的方法允许自然的组合来自不同数据的multiple distance。最终产生population的表示，preserves 多个距离。进一步的，我们的方法选择了与距离相关的most predictive的特征。

We show that combining multiple distances related to the condition improves the overall characterization and classification of the three clinical groups compared to the use of single distances and classical unsupervised manifold learning.