计算机视觉基础 - 特征向量生成算法 ORB & HOG
我们通常所说的图片中的关键点 Keypoints 是指图片当中具有明显区别意义或对被检测对象有定义性质的小区域,例如成角和边缘部分。执行边缘、成角和轮廓检测等前述所有的独立的特征点 Feature Points 的提取都是为了后续在算法实现中可以通过将这些具有识别能力的特征组合起来构成特征向量 Feature Vector 来更稳健的完成物体的检测和识别。在计算机视觉的领域里,特征向量生成算法也常被称为 Feature Descriptor。
A feature descriptor is a representation of an image or an image patch that simplifies the image by extracting useful information and throwing away extraneous information.
ORB
ORB, oriented fast and rotated brief 是一个可以快速实现特征检测 Feature Detection 和特征向量生成 Vector Creation 的算法,其中 FAST 算法部分实现特征检测,而 BRIEF 则用于实现特征向量生成。事实上,在计算机视觉的很多应用领域里,特征检测和识别都是分步骤执行的,这也是很多计算机视觉应用的标准流程。
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to find top N points among them. It also use pyramid to produce multiscale-features.
FAST
在 ORB 算法的实现中,第一步就是要完成特征点的检测,这一步骤通过 FAST, Features from Accelerated Segments Test 算法来完成,其实现过程如下:
- 选定一个像素点 p 作为备选特征点,其强度记为 Ip
- 设定一个合适的阈值 t
- 选定围绕 p 点一周的 16 个像素点
- p 点是否构成特征点的判断标准为:这 16 个像素点中有 12 个像素的强度高于 Ip + t 或低于 Ip - t,否则 p 就不是特征点
- 为了提高算法的执行效率,算法中还添加了一个快速实现部分:可以首先只检测备选特征点周围 1,5,9,13 几个位置的点的强度,如果其中至少 3 个点的强度高于 Ip + t 或低于 Ip - t 才进一步检测 p 是否是特征点
具体的优缺点请见 OpenCV 的 文档 部分。
BRIEF
在完成了特征点检测后,ORB 通过 BRIEF, Binary Robust Independent Elementary Features 算法来生成二元的特征向量。
It takes smoothened image patch and selects a set of nd (x,y) location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done on these location pairs. For eg, let first location pairs be p and q. If I(p) < I(q), then its result is 1, else it is 0. This is applied for all the nd location pairs to get a nd dimensional bitstring.
引文中特征点附近的参考点的选择方式如下:先围绕特征点选择一个与这个特征点的距离服从标准差为 σ 的高斯分布的任意一点 p,在此基础上,再围绕 p 选择一个服从标准差为 σ / 2 的点 q,之后再将两者的强度进行对比,最终重复这个过程 nd 次。
在 ORB 算法的实现过程中,除了基本的 FAST 和 BRIEF 还引入了其他的措施,如 Image Pyramid 和 rBRIEF 来进一步提高算法对于缩放比例和图像旋转的稳健性。由于算法的快速识别能力,ORB 算法常用于在视频中执行具有稳定特征的对象识别(如人脸、车辆等)。
在实际使用中,会先利用 ORB 算法提取训练图片和目标图片的特征向量,之后再通过特征匹配来完成查找。
使用 ORB 进行特征提取的代码如下:
# Convert the training image to gray Scale
training_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Set the parameters of the ORB algorithm by specifying the maximum number of keypoints to locate and
# the pyramid decimation ratio
orb = cv2.ORB_create(200, 2.0)
# Find the keypoints in the gray scale training image and compute their ORB descriptor.
# The None parameter is needed to indicate that we are not using a mask.
keypoints, descriptor = orb.detectAndCompute(training_gray, None)
# Create copies of the training image to draw our keypoints on
keyp_without_size = copy.copy(training_image)
keyp_with_size = copy.copy(training_image)
# Draw the keypoints without size or orientation on one copy of the training image
cv2.drawKeypoints(training_image, keypoints, keyp_without_size, color = (0, 255, 0))
# Draw the keypoints with size and orientation on the other copy of the training image
cv2.drawKeypoints(training_image, keypoints, keyp_with_size, flags = cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
ORB keypoints detection
Every keypoint has a center, a size, and an angle. The center determines the location of each keypoint in the image; the size of each keypoint is determined by the patch size used by BRIEF to create its feature vector; and the angle tells us the orientation of the keypoint as determined by rBRIEF.
使用 ORB 算法实现目标检测的代码如下:
# Convert the training image to gray scale
training_gray = cv2.cvtColor(training_image, cv2.COLOR_BGR2GRAY)
# Convert the query image to gray scale
query_gray = cv2.cvtColor(query_image, cv2.COLOR_BGR2GRAY)
# Set the parameters of the ORB algorithm by specifying the maximum number of keypoints to locate and
# the pyramid decimation ratio
orb = cv2.ORB_create(1000, 2.0)
# Find the keypoints in the gray scale training and query images and compute their ORB descriptor.
# The None parameter is needed to indicate that we are not using a mask in either case.
keypoints_train, descriptors_train = orb.detectAndCompute(training_gray, None)
keypoints_query, descriptors_query = orb.detectAndCompute(query_gray, None)
# Create a Brute Force Matcher object. Set crossCheck to True so that the BFMatcher will only return consistent
# pairs. Such technique usually produces best results with minimal number of outliers when there are enough matches.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck = True)
# Perform the matching between the ORB descriptors of the training image and the query image
matches = bf.match(descriptors_train, descriptors_query)
# The matches with shorter distance are the ones we want. So, we sort the matches according to distance
matches = sorted(matches, key = lambda x : x.distance)
# Connect the keypoints in the training image with their best matching keypoints in the query image.
# The best matches correspond to the first elements in the sorted matches list, since they are the ones
# with the shorter distance. We draw the first 30 mathces and use flags = 2 to plot the matching keypoints
# without size or orientation.
result = cv2.drawMatches(training_gray, keypoints_train, query_gray, keypoints_query, matches[:30], query_gray, flags = 2)
# Display the best matching points
plt.title('Best Matching Points')
plt.imshow(result)
plt.show()
Object detection and matching with ORB
HOG, Histogram of Oriented Gradient
对于 ORB 这类基于特征点匹配的算法来说,其对于检测那些有固定特征的且不易受背景环境影响的对象如人脸识别类的应用来说具有非常高的识别度和准确率,但对于更加一般的物体识别,如行人检测,则由于姿态和背景变化多样,因此并不能很好的寻找特定的特征点来完成。同理,前面讲到的边缘和轮廓检测这类方法同样也受到待识别物体和背景的对比情况的影响,因此需要引入诸如本节的 HOG 这类的算法。
In the HOG feature descriptor, the distribution ( histograms ) of directions of gradients ( oriented gradients ) are used as features. Gradients ( x and y derivatives ) of an image are useful because the magnitude of gradients is large around edges and corners ( regions of abrupt intensity changes ) and we know that edges and corners pack in a lot more information about object shape than flat regions.
由于直方图 Histogram 可以很好的展示其中的数据在不同取值范围内的分布情况,例如灰度图片中亮度强度的分布情况,HOG 算法是通过统计图片中的梯度方向的直方图来对图像进行分析,其实现过程如下:
- 计算图像中待检测区域 Detection Window 内每一个像素点的梯度大小和方向
- 将相邻的像素点分组在临近的正方形区域 Cell 内,例如 8 x 8 的区域内
- 在每一个区域内统计不同像素点的梯度方向在不同范围内的分布情况,进而建立梯度方向为区间的梯度大小的直方图
- 上述直方图中梯度的大小对于强度变化异常敏感,因此需要通过归一化 Normalization 来去除强度的影响,在实际算法实现中采用的是在一个大的 block 内来实现归一化,这一处理也有助于进一步减少图像表征信息的量
- 这个梯度方向的直方图本身即可以做为一个特征向量,可以用一系列的梯度方向的直方图构成的特征向量来训练一个分类器如 SVM 进行图像识别
Fig.3 - HOG Diagram, form Udacity
- Given the image of particular object, set a detection window (region of interest) that covers the entire object in the image (see Fig. 3).
- Calculate the magnitude and direction of the gradient for each individual pixel in the detection window.
- Divide the detection window into connected cells of pixels, with all cells being of the same size (see Fig. 3). The size of the cells is a free parameter and it is usually chosen so as to match the scale of the features that want to be detected. For example, in a 64 x 128 pixel detection window, square cells 6 to 8 pixels wide are suitable for detecting human limbs.
- Create a Histogram for each cell, by first grouping the gradient directions of all pixels in each cell into a particular number of orientation (angular) bins; and then adding up the gradient magnitudes of the gradients in each angular bin (see Fig. 3). The number of bins in the histogram is a free parameter and it is usually set to 9 angular bins.
- Group adjacent cells into blocks (see Fig. 3). The number of cells in each block is a free parameter and all blocks must be of the same size. The distance between each block (known as the stride) is a free parameter but it is usually set to half the block size, in which case you will get overlapping blocks. The HOG algorithm has been shown empirically to work better with overlapping blocks.
- Use the cells contained within each block to normalize the cell histograms in that block (see Fig. 3). If you have overlapping blocks this means that most cells will be normalized with respect to different blocks . Therefore, the same cell may have several different normalizations.
- Collect all the normalized histograms from all the blocks into a single feature vector called the HOG descriptor.
- Use the resulting HOG descriptors from many images of the same type of object to train a machine learning algorithm, such as an SVM, to detect those type of objects in images. For example, you could use the HOG descriptors from many images of pedestrians to train an SVM to detect pedestrians in images. The training is done with both positive and negative examples of the object you want detect in the image.
- Once the SVM has been trained, a sliding window approach is used to try to detect and locate objects in images. Detecting an object in the image entails finding the part of the image that looks similar to the HOG pattern learned by the SVM.
# Initialize a HOG detector
hog = cv2.HOGDescriptor()
svm_detector = hog.getDefaultPeopleDetector()
hog.setSVMDetector(svm_detector)
if img is None:
raise Exception("Please check the availability of the image file!")
bboxes, weights = hog.detectMultiScale(img, winStride=(8, 8), padding=(32, 32),
scale=1.05, finalThreshold=2, hitThreshold=0)