PaperReading3MDNet

2019-03-06  本文已影响0人  我好菜啊_

摘录自:
[Learning Multi-Domain Convolutional Neural Networks for Visual Tracking]
Author:Hyeonseob Nam, Bohyung Ham
参考:https://zhuanlan.zhihu.com/p/25312850


Multi-domain representation learning(提前用CNN学共性)
with
online visual tracking(在线学特性)


Multi-domain representation learning part
separate domain-independent information from domain-specific one and learn generic feature representations for visual tracking.


MDNet的结构


训练过程
K个视频,N次循环
mini-batch:某一视频中随机采8帧图片,在这8帧图片上随机采32个正样本和96个负样本——>128个框
每次循环K次迭代(分别用K个视频来取mini-batch)
SGD
每个视频会对应自己的fc6层
通过这样的训练来学得各个视频中目标的共性
generic target representation in shared layer

补充:integrate hard negative mining step into minibatch selection
就是让负样本越来越难分,从而使得网络的判别能力越来越强。

each iteration of learning procedure
a minibatch->Mp个positives,Mn个hard negatives
Mn个hard negatives是怎么来的呢:
testing M(>>Mn)negatives选分数最高的Mn个

This approach examines a predefined number of
samples and identifies critical negative examples effectively
without explicitly running a detector to extract false positives as in the standard hard negative mining techniques.

Only the weights in the fully connected layers w4:6 are updated online whereas the ones in the convolutional layers w1:3 are fixed throughout tracking; this strategy is beneficial to not only computational efficiency but also avoiding overfitting by preserving domain-independent


训练好的网络在做test的时候,会新建一个fc6层,在线fine-tune fc4-fc6层,卷积层保持不变。


online visual tracking


更新策略
The online update is conducted to model long-term and short-term appearance variations of a target for robustness and adaptiveness, respectively.(更新是为了建模目标的长期或短期的变化)
采用long-term和short-term两种更新方式。
在跟踪的过程中,会保存历史跟踪到的目标作为正样本(得分高于一个阈值)
long-term对应历史的100个样本(超过100个抛弃最早的),固定时间间隔做一次网络的更新(程序中设置为每8帧更新一次)
short-term对应20个(超过20个抛弃最早的),在目标得分低于0.5进行更新。负样本都是用short-term的方式收集的。


Bounding Box Regression
Due to the high-level abstraction of CNN-based features and our data augmentation strategy which samples multiple positive examples around the target (which will be described in more detail in the next subsection), our network sometimes fails to find tight bounding boxes enclosing the target.
最后得到的candidate不是直接作为目标,还要做一步bounding box regression。做法与R-CNN一样。

Given the first frame of a test sequence, we train a simple linear regression model to predict the precise target location using conv3 features of the samples near the target location. In the subsequent frames, we adjust the target locations estimated from Eq. (1) using
the regression model if the estimated targets are reliable
The bounding box regressor is trained only in the first frame since it is time consuming for online update and incremental learning of the regression model may not be very helpful considering its risk.


总结一下MDNet效果好的原因:


采用比较浅层的网络的原因
1,visual tracking aims to distinguish only two classes, target and background, which requires much less model complexity
2,a deep CNN is less effective for precise target localization since the spatial information tends to be diluted as a network goes deeper(越深越抽象的意思?)
3,since targets in visual tracking are typically small, it is desirable
to make input size small, which reduces the depth of the network naturally.
4, a smaller network is obviously more efficient in visual tracking problem, where training and testing are performed online.


Online Tracking Algorithm


Implementation Detials


Experiment


In particular, our tracker successfully track targets in low resolution while all the trackers based on low-level features are not successful in the challenge.

Furthermore, MDNet works well with imprecise re-initializations as shown in the region noise experiment results, which implies that it can be effectively combined with a re-detection module and achieve long-term tracking.

上一篇下一篇

猜你喜欢

热点阅读