Paper | Tracking everything in

2023-12-21 本文已影响0人与阳光共进早餐

1 写在前面

https://arxiv.org/pdf/2207.12978.pdf
ECCV2022
task： large-scale Multiple Object Tracking （MOT）

2 introduction

MOT task: estimate the trajectory of objects in a video sequence.

limitation1: common MOT benchmarks [16,32,11] only consider tracking objects from very few pre-defined categories, e.g., pedestrian and car, existing MOT methods do not perform well on a large number of categories.

limitation2：the metrics of MOT can be better refined

Current MOT models and metrics are mainly designed for single-category multiple-object racking. When extending to large-scale multi-category MOT, methods simply detect and classify each object and achieve the association via the same labels. This relies heavily on the classification results.

Thus, when the classification is inaccurate e.g., in large-scale multi-category MOT, existing models and evaluation metrics should be improved.

This paper：
To expand tracking to a more general scenario, we propose that classification should be disentangled from tracking, in both evaluation and model design, for multi-category MOT.

design a new metric, Track Every Thing Accuracy (TETA)；
2）a new model, Track Every Thing tracker (TETer).

exp：
large-scale multi-category tracking datasets, TAO and BDD100K.

3 Tracking-Every-Thing Metric

3.1 Limitations for Large-scale MOT Evaluation

How to handle classification. 1. Simply associating objects via the same label relies on the correct classification results. 2. the most naive solution, ignoring the classification results, leads to the evaluation being dominated by the head classes in the long-tailed distribution dataset.

Incomplete Annotations: the large-scale datasets are not exhaustively annotated, so how can we identify and penalize false positive(FP) predictions?

3.2 Tracking-Every-Thing Accuracy (TETA)

TETA consists of three parts:

a localization score
an association score
a classification score

evaluate the different aspects properly.

To avoid false punishments, we ignore the predictions that are not assigned to any clusters during evaluation.

4 Tracking-Every-Thing Tracker

framework：

4.1 class-agnostic localization

This shows the bottleneck of the detection model lies in the classification

Thus, this paper first performs class-agnostic localization.

4.2 associating everything

common clues: location, appearance, and class
motion (location) is irregular (x)
many objects are not predefined (x)
while objects in different classes usually have different appearances (selected as the main cue)

Instead of using class information as "hard" prior, the class information is used in a "soft" way by contrastive learning.

With the CEM learned, association can be done by comparing the similarities