与姿态、动作相关的数据集介绍

2019-08-13 本文已影响0人 Woooooooooooooo

参考：https://blog.csdn.net/qq_38522972/article/details/82953477

姿态论文整理：https://blog.csdn.net/zziahgf/article/details/78203621

经典项目：https://blog.csdn.net/ls83776736/article/details/87991515

姿态识别和动作识别任务本质不一样，动作识别可以认为是人定位和动作分类任务，姿态识别可理解为关键点的检测和为关键点赋id任务（多人姿态识别和单人姿态识别任务）

由于受到收集数据设备的限制，目前大部分姿态数据都是收集公共视频数据截取得到，因此2D数据集相对来说容易获取，与之相比，3D数据集较难获取。2D数据集有室内场景和室外场景，而3D目前只有室内场景。

ms coco

地址：http://cocodataset.org/#download

样本数：>= 30W

关节点个数：18

全身，多人，keypoints on 10W people

LSP

地址：http://sam.johnson.io/research/lsp.html

样本数：2K

关节点个数：14

全身，单人

LSP dataset to 10; 000 images of people performing gymnastics, athletics and parkour.

FLIC

地址：https://bensapp.github.io/flic-dataset.html

样本数：2W

关节点个数：9

全身，单人

MPII

样本数：25K

全身，单人/多人，40K people，410 human activities

16个关键点：0 - r ankle, 1 - r knee, 2 - r hip,3 - l hip,4 - l knee, 5 - l ankle, 6 - l ankle， 7 - l ankle，8 - upper neck, 9 - head top,10 - r wrist,11 - r elbow, 12 - r shoulder, 13 - l shoulder,14 - l elbow, 15 - l wrist

无mask标注

摘自论文1

In order to analyze the challenges for fine-grained human activity recognition, we build on our recent publicly available \MPI Human Pose" dataset [2]. The dataset was collected from YouTube videos using an established two-level hierarchy of over 800 every day human activities. The activities at the first level of the hierarchy correspond to thematic categories, such as ”Home repair", “Occupation", “Music playing", etc., while the activities at the second level correspond to individual activities, e.g. ”Painting inside the house", “Hairstylist" and ”Playing woodwind". In total the dataset contains 20 categories and 410 individual activities covering a wider variety of activities than other datasets, while its systematic data collection aims for a fair activity coverage. Overall the dataset contains 24; 920 video snippets and each snippet is at least 41 frames long. Altogether the dataset contains over a 1M frames. Each video snippet has a key frame containing at least one person with a sufficient portion of the body visible and annotated body joints. There are 40; 522 annotated people in total. In addition, for a subset of key frames richer labels are available, including full 3D torso and head orientation and occlusion labels for joints and body parts.

为了分析细粒度人类活动识别的挑战，我们建立了我们最近公开发布的\ MPI Human Pose“数据集[2]。数据集是从YouTube视频中收集的，使用的是每天800多个已建立的两级层次结构人类活动。层次结构的第一级活动对应于主题类别，例如“家庭维修”，“职业”，“音乐播放”等，而第二级的活动对应于个人活动，例如“在屋内绘画”，“发型师”和“播放木管乐器”。总的来说，数据集包含20个类别和410个个人活动，涵盖比其他数据集更广泛的活动，而其系统数据收集旨在实现公平的活动覆盖。数据集包含24; 920个视频片段，每个片段长度至少为41帧。整个数据集包含超过1M帧。每个视频片段都有一个关键帧，其中至少包含一个人体，其中有足够的身体可见部分和带注释的身体关节。总共有40个; 522个注释人。此外，对于关键帧的子集，可以使用更丰富的标签，包括全3D躯干和头部方向以及关节和身体部位的遮挡标签。

Pose-track

14个关键点：0 - r ankle, 1 - r knee, 2 - r hip,3 - l hip,4 - l knee, 5 - l ankle, 8 - upper neck, 9 - head top,10 - r wrist,11 - r elbow, 12 - r shoulder, 13 - l shoulder,14 - l elbow, 15 - l wrist

不带mask标注，带有head的bbox标注

readme.md翻译（标注里）

about

PoseTrack is a large-scale benchmark for human pose estimation and tracking in image sequences. It provides a publicly available training and validation set as well as an evaluation server for benchmarking on a held-out test set (www.posetrack.net).

PoseTrack是图像序列中人体姿态估计和跟踪的大规模基准。它提供了一个公开的培训和验证集以及一个评估服务器，用于对保留的测试集（www.posetrack.net）进行基准测试。

Annotations

In the PoseTrack benchmark each person is labeled with a head bounding box and positions of the body joints. We omit annotations of people in dense crowds and in some cases also choose to skip annotating people in upright standing poses. This is done to focus annotation efforts on the relevant people in the scene. We include ignore regions to specify which people in the image where ignored duringannotation.

在PoseTrack基准测试中，每个人都标有头部边界框和身体关节的位置。我们在密集的人群中省略了人们的注释，并且在某些情况下还选择跳过以直立姿势对人进行注释。 这样做是为了将注释工作集中在场景中的相关人员上。我们包括忽略区域来指定图像中哪些人在注释期间被忽略。

Each sequence included in the PoseTrack benchmark correspond to about 5 seconds of video. The number of frames in each sequence might vary as different videos were recorded with different number of frames per second. For the **training** sequences we provide annotations for 30 consecutive frames centered in the middle of the sequence. For the **validation and test ** sequences we annotate 30 consecutive frames and in addition annotate every 4-th frame of the sequence. The rationale for that is to evaluate both smoothness of the estimated body trajectories as well as ability to generate consistent tracks over longer temporal span. Note, that even though we do not label every frame in the provided sequences we still expect the unlabeled frames to be useful for achieving better performance on the labeled frames.

PoseTrack基准测试中包含的每个序列对应于大约5秒的视频。每个序列中的帧数可能会有所不同，因为不同的视频以每秒不同的帧数记录。 对于**训练**序列，我们提供了以序列中间为中心的30个连续帧的注释。对于**验证和测试**序列，我们注释30个连续帧，并且另外注释序列的每第4个帧。其基本原理是评估估计的身体轨迹的平滑度以及在较长的时间跨度上产生一致的轨迹的能力。请注意，即使我们没有在提供的序列中标记每一帧，我们仍然期望未标记的帧对于在标记帧上实现更好的性能是有用的。

Annotation Format

The PoseTrack 2018 submission file format is based on the Microsoft COCO dataset annotation format. We decided for this step to 1) maintain compatibility to a commonly used format and commonly used tools while 2) allowing for sufficient flexibility for the different challenges. These are the 2D tracking challenge, the 3D tracking challenge as well as the dense 2D tracking challenge.

PoseTrack 2018提交文件格式基于Microsoft COCO数据集注释格式。我们决定这一步骤1）保持与常用格式和常用工具的兼容性，同时2）为不同的挑战提供足够的灵活性。这些是2D跟踪挑战，3D跟踪挑战以及密集的2D跟踪挑战。

Furthermore, we require submissions in a zipped version of either one big .json file or one .json file per sequence to 1) be flexible w.r.t. tools for each sequence (e.g., easy visualization for a single sequence independent of others and 2) to avoid problems with file size and processing.

此外，我们要求在每个序列的一个大的.json文件或一个.json文件的压缩版本中提交1）灵活的w.r.t. 每个序列的工具（例如，单个序列的简单可视化，独立于其他序列和2），以避免文件大小和处理的问题。

The MS COCO file format is a nested structure of dictionaries and lists. For evaluation, we only need a subsetof the standard fields, however a few additional fields are required for the evaluation protocol (e.g., a confidence value for every estimated body landmark). In the following we describe the minimal, but required set of fields for a submission. Additional fields may be present, but are ignored by the evaluation script.

MS COCO文件格式是字典和列表的嵌套结构。为了评估，我们仅需要标准字段的子集，但是评估协议需要一些额外的字段（例如，每个估计的身体标志的置信度值）。在下文中，我们描述了提交的最小但必需的字段集。可能存在其他字段，但评估脚本会忽略这些字段。

.json dictionary structure

At top level, each .json file stores a dictionary with three elements:

* images

* annotations

* categories

The ‘images’ element

it is a list of described images in this file. The list must contain the information for all images referenced by a person description in the file. Each list element is a dictionary and must contain only two fields: `file_name` and `id` (unique int). The file name must refer to the original posetrack image as extracted from the test set, e.g., `images/test/023736_mpii_test/000000.jpg`.

它是此文件中描述的图像列表。该列表必须包含文件中人员描述所引用的所有图像的信息。 每个列表元素都是一个字典，只能包含两个字段：`file_name`和`id`（unique int）。 文件名必须是指从测试集中提取的原始posetrack图像，例如`images / test / 023736_mpii_test / 000000.jpg`。

The ‘annotations’ element

This is another list of dictionaries. Each item of the list describes one detected person and is itself a dictionary. It must have at least the following fields:

* `image_id` (int, an image with a corresponding id must be in `images`),

* `track_id` (int, the track this person is performing; unique per frame),`

* `keypoints` (list of floats, length three times number of estimated keypoints in order x, y, ? for every point. The third value per keypoint is only there for COCO format consistency and not used.),

* `scores` (list of float, length number of estimated keypoints; each value between 0. and 1. providing a prediction confidence for each keypoint),

这是另一个词典列表。列表中的每个项目描述一个检测到的人并且本身是字典。它必须至少包含以下字段：

*`image_id`（int，具有相应id的图像必须在`images`中），

*`track_id`（int，此人正在执行的追踪;每帧唯一），

`*`keypoints`（浮点数列表，长度是每个点x，y，？的估计关键点数量的三倍。每个关键点的第三个值仅用于COCO格式的一致性而未使用。），

*`得分`（浮点列表，估计关键点的长度数;每个值介于0和1之间，为每个关键点提供预测置信度），

Human3.6M （3D，姿态识别）

Human3.6M数据集有360万个3D人体姿势和相应的图像，共有11个实验者（6男5女，论文一般选取1，5，6，7，8作为train，9，11作为test），共有17个动作场景，诸如讨论、吃饭、运动、问候等动作。该数据由4个数字摄像机，1个时间传感器，10个运动摄像机捕获。

CMU Panoptic dataset（3D，姿态识别）

MPI-INF-3DHP （3D，姿态识别）

由Max Planck Institute for Informatics制作，详情可见Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision论文

AVA（动作识别）

论文地址：https://arxiv.org/abs/1705.08421

FashionPose（动作识别）

Armlets（动作识别）

JHMDB（动作识别）

1，单人姿态估计的重要论文

2014----Articulated Pose Estimation by a Graphical Model with ImageDependent Pairwise Relations

2014----DeepPose_Human Pose Estimation via Deep Neural Networks

2014----Joint Training of a Convolutional Network and a Graphical Model forHuman Pose Estimation

2014----Learning Human Pose Estimation Features with Convolutional Networks

2014----MoDeep_ A Deep Learning Framework Using Motion Features for HumanPose Estimation

2015----Efficient Object Localization Using Convolutional Networks

2015----Human Pose Estimation with Iterative Error

2015----Pose-based CNN Features for Action Recognition

2016----Advancing Hand Gesture Recognition with High Resolution ElectricalImpedance Tomography

2016----Chained Predictions Using Convolutional Neural Networks

2016----CPM----Convolutional Pose Machines

2016----CVPR-2016----End-to-End Learning of Deformable Mixture of Parts andDeep Convolutional Neural Networks for Human Pose Estimation

2016----Deep Learning of Local RGB-D Patches for 3D Object Detection and 6DPose Estimation

2016----PAFs----Realtime Multi-Person 2D Pose Estimation using PartAffinity Fields （openpose）

2016----Stacked hourglass----StackedHourglass Networks for Human Pose Estimation

2016----Structured Feature Learning for Pose Estimation

2017----Adversarial PoseNet_ A Structure-aware Convolutional Network forHuman pose estimation (alphapose)

2017----CVPR2017 oral----Realtime Multi-Person 2D Pose Estimation usingPart Affinity Fields

2017----Learning Feature Pyramids for Human Pose Estimation

2017----Multi-Context_Attention_for_Human_Pose_Estimation

2017----Self Adversarial Training for Human Pose Estimation

2，多人姿态估计的重要论文

2016----AssociativeEmbedding_End-to-End Learning for Joint Detection and Grouping

2016----DeepCut----Joint Subset Partition and Labeling for Multi PersonPose Estimation

2016----DeepCut----Joint Subset Partition and Labeling for Multi PersonPose Estimation_poster

2016----DeeperCut----DeeperCut A Deeper, Stronger, and Faster Multi-PersonPose Estimation Model

2017----G-RMI----Towards Accurate Multi-person Pose Estimation in the Wild

2017----RMPE_ Regional Multi-PersonPose Estimation

2018----Cascaded Pyramid Network for Multi-Person Pose Estimation

“级联金字塔网络用于多人姿态估计”

2018----DensePose: Dense Human Pose Estimation in the Wild

”密集人体：野外人体姿势估计“（精读，DensePose有待于进一步研究）

2018---3D Human Pose Estimation in the Wild by Adversarial Learning

“对抗性学习在野外的人体姿态估计”