图像分割:数据标注,分割方法,分割评价
数据标注
2008 IJCV
LabelMe: a database and web-based tool for image annotation
Abstract
First, provide a good dataset for object detection and a excellent labeling tool.
Then, compare our dataset with other existing datasets.
At last, ``Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web."
We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.
1 Introduction
First, emphasize that a good dataset is key for supervised tasks, and is useful for an unsupervised task. Although there are some approaches such as Bayesian approach to learning or multi-task learning.
Then, depict some aspects of labels and list some famous datasets.
Next, introduce some existing annotation ways. Specially, introduce some datasets created through web.
At last, clarify why we need to create a new dataset, its functions, and introduce the papers' organization briefly.
"Biederman estimates that humans can recognize about 30000 entry-level object categories."
"For each object present in an image, the labels should provide information about the object's identity, shape, location, and possibly other attributes such as pose."
2 LabelMe
2.1 Goals of the LabelMe project
List the advantages of LabelMe dataset.
2.2 The LabelMe web-based annotation tool
First, introduce how to use LabelMe to label an image.
Then, discuss some problems. The author mentioned:
1st, quality control
2nd, the complexity of the polygons
3rd, what to label
4th, the text label.
Finally, introduce the storage of the annotation files.
2.3 Content and evolution of the LabelMe database
Author depicts the histogram of
1st, number of objects per image
2nd, percentage of pixels labeled, and the curve of description rank -- number of polygons.
1_LabelMe_2.png, the summary of the database content (show in figure) reveals some interesting tips. In the figure2(a), we see that . In figure2(b)(c), pay attention to the tail, especially in (b), it is coherent with my intuition, the background (artificial image) is just one semantical object.
2.4 Quality of the polygonal boundaries
Author counts the number of polygon's control points.
2_LabelMe_5.png, I think that label image with polygon is not very well, we should label image with continuous edge which is from the segmentation algorithm, not from subjects. I mean we need minimize the influence of subjects' prior knowledge.
2.5 Distributions of object location and size
3_LabelMe_7.png, Aha! This is interesting and amazing! I first see this kind of statistics!
3 Extending the dataset
3.1 Enhancing object labels with WordNet
For the text label, the author adopts the WordNet to get uniform object descriptors.
In my own view, it is so difficult and attractive that I need extra attention to this problem, I think we need a semantic network which is a key part of a big framework.
4_LabelMe_t1.png, The screenshot shows some aspects of the text label problem.
5_LabelMe_5.png. This can be used for an evidence which indicates that synonyms' distribution is highly correlated to images in our life, I mean that there are some relationship with cognition neuroscience, semantic network.
3.2 Object-parts hierarchies
Utilize the overlap of polygons, the author exploit the object-parts hierarchies. The result is seen in 6_LabelMe_10.png.
6_LabelMe_10.png
"When two polygons have a high degree of overlap, this provides evidence of either (i) an object-part hierarchy or (ii) an occlusion".
So if it is occlusion, we can use it to restore the depth of scene, if not, we can construct a part-whole graph. Woo, it is cool! I think up that in human vision, one eye also own some kind of the capacity of sensing depth, one of the reasons is that human can reason by occlusions.
3.3 Depth ordering
The method is very simple and full of subjectivity:
- First, some objects are always on the bottom layer since they cannot occlude any objects. For example, objects that do not own any boundaries (e.g. sky) and objects that are on the lowest layer (e.g. sidewalk and road).
- second, an object that is completely contained in another one is on top.
- Third, if two polygons overlap, the polygon that has more control points in the region of intersection is more likely to be on top.
- At last, Use image features to decide who owns the region of intersection.
In my opinion, these rules are not intrinsic, to decide the region belong to which region should be ruled by the semantic information, not these designed rules. What is more, the system should have the capacity of all-life learning, mistakes is inevitable, the key is the capacity of rectifying the mistakes to avoid repeat these mistakes.
7_LabelMe_13.png, is clear to show the scene's depth, this kind of stereo graph is worth learning.
3.4 Semi-automatic labeling
A simple application based on these segmentation region (Note that, the author does not mention which segmentation algorithm he uses. This application is just to prove his dataset is valuable for object detection.):
- First, find candidate regions: segmenting the image to produce 10-20 regions.
- Then, combine these regions to get around 30 regions by discarding bad combinations.
- Next, compute features: resize the candidate region to a normalized size, extract features (`Gist features').
- At last, train classifiers to get scores and choose the maximum, get the object class.
4 Comparison with existing datasets for object detection and recognition
8_LabelMe_t3.png, 9_LabelMe_17.png is a summary of several datasets used for object detection and recognition research.
5 Conclusion
We described a web-based image annotation tool that was used to label the identity of objects and where they occur in images. We collected a large number of high quality annotations, spanning many different object categories, for a large set of images, many of which are high resolution. We presented quantitative results of the dataset contents showing the quality, breadth, and depth of the dataset. We showed how to enhance and improve the quality of the dataset through the application of WordNet, heuristics to recover object parts and depth ordering, and training of an object detector using the collected labels to increase the dataset size from images returned by online search engines. We finally compared against other existing state of the art datasets used for object detection and recognition.
Our goal is not to provide a new benchmark for computer vision. The goal of the LabelMe project is to provide a dynamic dataset that will lead to new research in the areas of object recognition and computer graphics, such as object recognition in context and photorealistic rendering.
reference
分割方法
2004 GrabCut Kolmogorov
2006 GraphCut Boykov
2013 OneCut Boykov
分割评价
2008 Computer Vision and Image Understanding
Image segmentation evaluation: A survey of unsupervised methods
对2008年以前的分割评价方法做了一个总结,很全面。
Haralick and Shapiro proposed four criteria:
- Regions should be uniform and homogeneous with respect to some characteristics(s)
- Adjacent regions should have significant differences with respect to the characteristic on which they are uniform
- Region interiors should be simple and without holes
- Boundaries should be simple, not ragged, and be spatially accurate
The author term the criteria (i) and (ii) as Characteristic Criteria, and one measures intra-region uniformity another inter-region disparity, the resting two are Semantic Criteria.
2016 PAMI
Supervised evaluation of image segmentation and object proposal techniques
提出了一个元评价的方法,具体忘记了,印象中就是说了一堆废话,哈哈哈。