跟踪深度学习·神经网络·计算机视觉机器学习

目标跟踪简述+深度学习目标跟踪+context目标跟踪

2017-09-21  本文已影响585人  BookThief

Visual Tracking With Deep Learning And The Context

一. The overview of Visual Tracking 目标跟踪简介

1. What is visual tracking?


This three pictures are the 1,40,80 frame of the same video.When we give the bounding-box of the running woman in the first frame,the bounding-box can still circle the same woman.

Given the initialized state (e.g.position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the subsequent frames.

Although object tracking has been studied for several decades, and much progress has been made in recent years , it remains a very challenging problem.

Numerous factors affect the performance of a tracking algorithm, such as illumination variation, occlusion, as well as background clutters, and there exists no single tracking approach that can successfully handle all scenarios.

2. Difficulties of visual tracking

There are many limiting factors of object tracking based on video image. In the theory and method, the research on the target tracking is confronted with great challenge.

The diversity of the target

The complexity of the scene

In a dilemma

困境

3. Recent algorithms for visual tracking

Based on model matching

----- global model matching

-----Local model matching

-----Feature matching

Based on classification

Based on bayes filtering

Based on deep learning(after 2015)

Depth learning in the field of target tracking is not smooth sailing. The main problem is the lack of training data: one of the magic of the depth model comes from the effective training of a large number of labeled training data, while the target tracking only provides the first frame of the bounding-box as training data. In this case, it is difficult to train a depth model at the beginning of the trace for the current target.

Several ideas:

4. Deep Learning for visual tracking

DLT: Learning a Deep Compact Image Representation for Visual Tracking (NIPS 2014)

DLT
预训练:SDAE+Tiny Image dataset+无监督训练:通用的物体表征能力;
在线跟踪结构:SDAE的encoding(通用特征表示)+sigmoid分类(二分类跟踪方式):获得 目标与背景的分类;
微调:利用第一帧获取正负样本:获取当前目标与背景更有针对性的分类网络;
后续帧跟踪:当前帧粒子滤波提取patch+patch依次输入分类网络+置信度;
模型更新:限定阈值;
优点:预训练+微调:解决训练数据不足
缺点:32*32 自编码器是否适合分类跟踪任务 4层网络特征表达能力不足

SO-DLT:Transferring Rich Feature Hierarchies for Robust Visual Tracking(ICCV 2015)

SO-DLT
在线跟踪:处理t帧时,以t-1帧预测位置为中心; 从小到大采样不同尺度区域,依次放入网络; 当CNN输出的概率图高于一个值,停止采样,以当前概率图为最佳区域; 在最终区域里确定boundingbox大小与位置
模型更新:CNNs---->及时响应目标变化; CNNl---->对噪声鲁棒;
借鉴:ensemble的思路解决update 的敏感性 ,跟踪算法提高评分的杀手锏。

FCNT: Visual Tracking with Fully Convolutional Networks (ICCV 2015)

FCNT
预训练:VGGNet+imageNet已分类数据集;
核心: FeatureMap可以直接做跟踪目标定位;
高层特征:擅长区分不同类(高度抽象)
底层特征:擅长区分同类物体(关注局部细节)
两层卷积结构: conv4-3:区分相似物体distractor(SNet) conv5-3:区分类别信息 (GNet)
在线跟踪: 利用上一帧中心采样一块区域,分别输入SNet和GNet; 生成两个heatmap(互补);
SNet:去掉了distractor
GNet:目标更加明显
总结: 有效抑制漂移,对遮挡不鲁棒 track新思路(多少层 哪几层)

MDNet:Learning Multi-Domain Convolutional Neural Networks for Visual Tracking(CVPR 2016)

图像分类与实际跟踪的巨大差别;
图像分类: 目标和背景的任意组合,目标出现在任何一个背景都要被检测出;
实际跟踪: 给出第一帧的前后景后,后续帧前后景和第一帧很类似;
直接用视频序列预训练CNN; 目标差别:某类物体在一个序列中是目标,在另一个就可能是背景;

MDNet
共享层:CNN获得目标通用的特征表达;
特定区域层:每个训练序列--->单独的domain--->单独的二分类层--->区分当前序列前后景 (解决不同序列目标不一致问题)
确定bounding:RCNN Region Proposal方式 上一帧附近寻找256个proposal,之后进行bounding回归
总结:Precision达到了94.8% 实时性:目标检测的Region Proposal是否适合在线跟踪任务 (256个proposal 89个domain)

Use RNN?

这是一个视频的第一帧 第10帧和第20帧,汽车在匀速前进时,视频序列具有明显的时序相关性。
跟踪任务的特殊性(时间序列,前后相关)
是否可以使用多方向的递归神经网络(RNN)学出跟踪视频序列的前后关联性?

What is RNN ?

RNN神经元 随时间展开的RNN

RNN Tracker

CVPR2016

image.png

AAAI2016

5. Visual Tracking With The Context

Context information is also very important for tracking.
Recently, some approaches have been proposed by mining auxiliary objects or local visual information surrounding the target to assist tracking .
The context information is especially helpful when the target is fully occluded or leaves the image region .
To improve the tracking performance, some tracker fusion methods have been proposed recently.

Context-Aware Visual Tracking

the environment can also be advantageous to the tracker if it contains objects that are correlated to the target

Question: whether the object being followed by the tracker is really the target?
Answer:Use the dynamic environment!


How to track a face in a crowd?

Why do we have to focus our attention only on the target?

It seems that:

So why not track the target and auxiliary objects as a group?

What is auxiliary objects?

This definition may cover a large variety of image regions or features

Experiments


(The yellow bounding-box is the target. the red are the color region.)

Tracking the Invisible: Learning Where the Object Might be

context helps in object detection is wellknown.
strongest predictors of vehicle presence and location in an image is the shadow it casts on the road


In tracking, many temporary, but potentially very strong links exist between the tracked object and the rest of the image.

local image features vote for the object.

the position of an object can be estimated even when it is not seen directly (e.g., fully occluded or outside of the image region)
How to choose the supporter?


Experiments

We can see what we can not see

Context Tracker: Exploring Supporters and Distracters

Visual tracking is very challenging when the target leaves the field of view leading the tracker to follow another similar object, and not reacquire the right target when it reappears.
There is additional information which can be exploited instead of using only the object region.

What is supporters and distracters?
Distracters


Supporters
Experiments

6. 目标跟踪的方向

提高目标的特征描述能力

上一篇下一篇

猜你喜欢

热点阅读