【笔记】PyImageResearch-DL4CV阅读笔记-5

2018-07-18 本文已影响48人曦沉

20180717：pyimageresearch的DeepLearning for ComputerVision电子书阅读笔记。

封面

Chapter 6： Configuring Your Development Environment 配置你的开发环境
Chapter 7： Your First Image Classifier 你的第一个图像分类器

Chapter 6： Configuring Your Development Environment
由于开发环境我使用的是Windows下配合Anaconda + CUDA 9.0 + cuDNN 7.0 + VS Code。具体的配置方法如果有需要我之后单独另写一篇教程。这里快速刷本章书中内容。

Python
本电子书教程中使用python作为程序语言

Keras
电子书使用Keras作为深度学习框架（或者说框架可能不准确），Keras可以使用tensorflow或者theano作为后端（backend），这个教程中作为入门学习，使用Keras。

Mxnet
Mxnet是一个支持分布式多机训练深度学习网络的库。这里只在使用重型数据集，例如ImageNet数据集，的时候才使用。

OpenCV, scikit-image, scikit-learn, and more
OpenCV在使用中为了把图像读取为读取为numpy array（数学矩阵）

配置开发环境
书中使用的是linux VM虚拟磁盘，这个方案作为教程很赞，和使用虚拟环境有相同的好处，但是部署明显方便了很多，虚拟磁盘中封装了操作系统和开发环境，免除了安装过程中可能出现的问题，类似于工作中会把SQL数据库和一些服务程序使用虚拟磁盘打包，然后部署到服务器(Windows Server)上，免除了部署过程中的路径修改的麻烦。

为了解决虚拟机调用GPU的问题，作者提出可以使用云加速，例如amazon cloud有提供带有GPU加速的实例(instance)。或者作者给出了使用FloydHub，可以把本地的GPU配置成为一个云服务GPU（这里不太清楚，没有尝试过FloyHub）。

如果不太熟悉linux命令行，作者给出了他们网站的命令行教程。

Chapter 7： Your First Image Classifier
PS: 终于到正文了。。。作者前六章写的也是狠。。。

作者开始使用K-NN(k-Nearest Neighbors)方法打造图像分类器。

图像加载需要考虑到内存容量，通常来说不可能把数据集完整的加载到内存中，所以对于大一些的数据集需要考虑动态加载的问题。

接下来作者开始逐步介绍如何手工打造个人定制化的python深度学习脚本。首先提供的示例代码中包含了如下结构：

project sturcture

A Basic Image Preprocessor 基础图像预处理
k-NN, SVM和CNN需要图像数据集拥有同样的向量大小。所以数据集中的图像首先需要与处理成一致的大小。

对于图像尺寸的缩放有两种办法，一种保持图像原有比例(Keep ratio)，另一种不考虑原有图像比例。但是没有定论说保持图像的比例一定比不抱持比例好。

接下来打开作者提供的simplepreprocessor.py文件

# import the necessary packages
import cv2

class SimplePreprocessor:
    def __init__(self, width, height, inter=cv2.INTER_AREA):
        # store the target image width, height, and interpolation
        # method used when resizing
        self.width = width
        self.height = height
        self.inter = inter

    def preprocess(self, image):
        # resize the image to a fixed size, ignoring the aspect
        # ratio
        return cv2.resize(image, (self.width, self.height),
            interpolation=self.inter)

这里定义了一个用于修改图像尺寸的函数。

Building an Image Loader 编写图像载入程序

作者提供了python的图像载入程序：

# import the necessary packages
import numpy as np
import cv2
import os

class SimpleDatasetLoader:
    def __init__(self, preprocessors=None):
        # store the image preprocessor
        self.preprocessors = preprocessors

        # if the preprocessors are None, initialize them as an
        # empty list
        if self.preprocessors is None:
            self.preprocessors = []

    def load(self, imagePaths, verbose=-1):
        # initialize the list of features and labels
        data = []
        labels = []

        # loop over the input images
        for (i, imagePath) in enumerate(imagePaths):
            # load the image and extract the class label assuming
            # that our path has the following format:
            # /path/to/dataset/{class}/{image}.jpg
            image = cv2.imread(imagePath)
            label = imagePath.split(os.path.sep)[-2]

            # check to see if our preprocessors are not None
            if self.preprocessors is not None:
                # loop over the preprocessors and apply each to
                # the image
                for p in self.preprocessors:
                    image = p.preprocess(image)

            # treat our processed image as a "feature vector"
            # by updating the data list followed by the labels
            data.append(image)
            labels.append(label)

            # show an update every `verbose` images
            if verbose > 0 and i > 0 and (i + 1) % verbose == 0:
                print("[INFO] processed {}/{}".format(i + 1,
                    len(imagePaths)))

        # return a tuple of the data and labels
        return (np.array(data), np.array(labels))

simpledatasetloader.py程序解读：

for (i, imagePath) in enumerate(imagePaths):

其中enumerate(imagePaths)类似于c#中List<string>。用于循环imagePaths列表中的每个imagePath。

def load(self, imagePaths, verbose=-1):

定义了图像读取函数，verbose的意义可以让我们监视多少图像被这个函数读取了。（从代码角度还不太明白）

作者使用/dataset_name/class/image.jpg的结构来存放数据集，具体结构如下:

数据集文件夹结构

回到代码，读取数据集图像之前检查preprocessors是不是None

if self.preprocessors is not None:

python中None，False，空字符串""，空列表[]，空字典{}，空元组()都相当于False
这里有个迷惑的点：

not, None, is None判断

继续分析simpledatasetloader.py文件中的代码，第42行使用verbose来获取读取数据集过程中的进度信息。
获取信息的过程是从外部调用load(self, imagePaths, verbose=-1)函数，verbose只使用在下面的代码中：

# show an update every `verbose` images
if verbose > 0 and i > 0 and (i + 1) % verbose == 0:
    print("[INFO] processed {}/{}".format(i + 1,
        len(imagePaths)))

verbose被用来设置返回读取进度的参数，假设调用时候使用：

sp = SimplePreprocessor(32, 32)
sdl = SimpleDatasetLoader(preprocessors=[sp])
(data, labels) = sdl.load(imagePaths, verbose=500)

表示每500张图片在console中输出一次进度信息。

k-NN: A Simple Classifier 一个简单的KNN分类器

k-NN算法(k-nearest neighbors algorithm)叫做：最近邻居法，K-邻近算法。实际上来说，算法本身不“学习”任何任容，它只是计算数据特征向量之间的距离，然后使用该距离进行分类。在这个例子中，使用的数据是RGB图像的像素值，也可以叫像素强度(intensities)。

那么我们有数据集，也就是有图像，有了图像也就是有了一堆矩阵（矩阵的内容是图像的像素的值）。然后就可以使用k-NN算法进行分类了。

如何判断图片像素的矩阵代表的是某类东西，比如数据集中包含：猫，狗和熊猫。也就是说我们已知了一堆矩阵的类别是猫，已知了一堆矩阵的类别是狗，已知了一堆矩阵的类别是熊猫。

作者设置了一个以fluffiness（松散度？不清楚具体定义是什么）为X-轴，lightness（明亮度？也不清楚具体定义是什么）为Y-轴的坐标系，得到了如下图示：

数据集映射结果

如图中所示，红色框的熊猫该如何分类？回头看k-NN算法，k-NN算法需要使用距离这个概念来判断，那么逐步分析，如何求距离？

求距离首先我们需要知道用什么方法求距离，一般来说有欧式距离和曼哈顿距离，分别定义如下：

欧式距离

曼哈顿距离

电子书中作者提出，欧式距离和曼哈顿距离需要根据实际的实验情况来决定使用哪个，没有严格意义上绝对的好和绝对的不好，使用了欧式距离计算方法。

有了距离的计算方法，继续看如何求取距离来使用k-NN分类。首先，假设我们已经读取了数据集中的所有训练图片，并且计算了每张图的fluffiness和lightness,得到了描述这个图片的二维点( $X_{fluffiness}$ , $Y_{lightness}$ )。那么读取完数据集之后我们可以在一个表示fluffiness和lightness的坐标系同绘制出数据集在坐标系中的分布。假设这时候我们有一张待分类的图片，它的fluffiness-lightness特征点为：( $X_{fluffiness(unknow)}$ , $Y_{lightness(unknow)}$ )。那么有如下图所示关系：

k=1时，待分数据和数据集数据的`fluffiness`-`lightness`图像

红色框中的图片为待分类数据，我们可以求取这个数据点和数据集中所有点的距离，最近距离为红色双向箭头。这时候表示k=1，我们可以预测待分类图片为狗。

k=3只是改变了计算距离点的个数，方法本质不变，对应的图像表示如下图：

k=3时，待分数据和数据集数据的`fluffiness`-`lightness`图像

k-NN Hyperparameters k-NN超参数
作者介绍k-NN有两个超参数，一个是k的取值，取太小容易受噪声影响，取太大容易造成较大偏差。第二个超参数是距离的计算方法，到底是欧式距离还是曼哈顿距离。

Implementing k-NN 实现k-NN
从这里开始，作者开始针对示例程序knn.py进行逐行讲解。下面是knn.py代码内容：

# USAGE
# python knn.py --dataset ../datasets/animals

# import the necessary packages
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.preprocessing import SimplePreprocessor
from pyimagesearch.datasets import SimpleDatasetLoader
from imutils import paths
import argparse

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, default="D:\WorkSpace\PyImageResearchProjects\datasets",
    help="path to input dataset")
ap.add_argument("-k", "--neighbors", type=int, default=1,
    help="# of nearest neighbors for classification")
ap.add_argument("-j", "--jobs", type=int, default=-1,
    help="# of jobs for k-NN distance (-1 uses all available cores)")
args = vars(ap.parse_args())

# grab the list of images that we'll be describing
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))

# initialize the image preprocessor, load the dataset from disk,
# and reshape the data matrix
sp = SimplePreprocessor(32, 32)
sdl = SimpleDatasetLoader(preprocessors=[sp])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.reshape((data.shape[0], 3072))

# show some information on memory consumption of the images
print("[INFO] features matrix: {:.1f}MB".format(
    data.nbytes / (1024 * 1000.0)))

# encode the labels as integers
le = LabelEncoder()
labels = le.fit_transform(labels)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
    test_size=0.25, random_state=42)

# train and evaluate a k-NN classifier on the raw pixel intensities
print("[INFO] evaluating k-NN classifier...")
model = KNeighborsClassifier(n_neighbors=args["neighbors"],
    n_jobs=args["jobs"])
model.fit(trainX, trainY)
print(classification_report(testY, model.predict(testX),
    target_names=le.classes_))

这里的目的是基于原始像素亮度的动物数据集训练一个k-NN分类器，对未知的图片进行分类。
STEP 1. 读取数据集
数据集中一共包含了3000张图片，狗，猫，熊猫各1000张。每张图片都是RGB彩色图像，作者通过之前提到的preprocess函数预处理数据集图像到32 X 32分辨率。
STEP 2. 分割数据集
分割数据集一部分为训练数据集，一部分为测试数据集。测试数据集用于超参数的更改验证。
STEP 3. 训练分类器
这一步使用训练数据集训练k-NN分类器。
STEP 4. 评价
训练好k-NN神经网络之后，可以使用测试数据集来测试性能，准确率等。从而达到评价k-NN的目的。

下面分析knn.py文件中的代码。

程序5 ~ 12行：import程序需要的库文件。

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.preprocessing import SimplePreprocessor
from pyimagesearch.datasets import SimpleDatasetLoader
from imutils import paths
import argparse

程序15 ~ 26行：设置运行时候需要读取的参数。

ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", type=str, default="D:\WorkSpace\PyImageResearchProjects\datasets",
    help="path to input dataset")
ap.add_argument("-k", "--neighbors", type=int, default=1,
    help="# of nearest neighbors for classification")
ap.add_argument("-j", "--jobs", type=int, default=-1,
    help="# of jobs for k-NN distance (-1 uses all available cores)")
args = vars(ap.parse_args())

# grab the list of images that we'll be describing
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))

这里首先设置一个用于承载参数列表的变量ap，也就是说当我们在console中敲入knn.py PARAMETER的时候，PARAMETER的内容会被存入ap中。
接下来ap.add_argument()函数设置读取PARAMETER时候遵循的格式，设置参数读取的类型，设置参数是否必须。
例如ap.add_argument("-d", "--dataset", type=str, default="D:\WorkSpace\PyImageResearchProjects\datasets", help="path to input dataset")，分析这条语句的内容：

"-d"说明使用的时候需要在这个参数之前加-d修饰符，那么在console调用的时候书写格式为：python knn.py -d PARAMETER，其中PARAMETER为自定义的参数。
"--dataset"用于设置该参数的标签，在之后解析ap值之后，可以通过通过["dataset"]设置路径，例如程序26行imagePaths = list(paths.list_images(args["dataset"]))
type=str表示读取的参数类型为string，
default="D:\WorkSpace\PyImageResearchProjects\datasets"表示该参数有默认值，可以不设置，默认值为D:\WorkSpace\PyImageResearchProjects\datasets。

程序28 ~ 46行：读取数据集，预处理数据集图片，拆分数据集为训练数据和测试数据。

# initialize the image preprocessor, load the dataset from disk,
# and reshape the data matrix
sp = SimplePreprocessor(32, 32)
sdl = SimpleDatasetLoader(preprocessors=[sp])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.reshape((data.shape[0], 3072))

# show some information on memory consumption of the images
print("[INFO] features matrix: {:.1f}MB".format(
    data.nbytes / (1024 * 1000.0)))

# encode the labels as integers
le = LabelEncoder()
labels = le.fit_transform(labels)

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
    test_size=0.25, random_state=42)

这里通过sp = SimplePreprocessor(32, 32)来创建一个图像处理类的实例，并初始化图像处理函数的矩阵参数大小为 32 X 32 分辨率。

然后通过sdl = SimpleDatasetLoader(preporcessors=[sp])来使用初始化后的图像处理类SimplePreprocessor的实例来初始化读取函数类SimpleDatasetLoader的实例，然后把承载有数据集图像路径的List传入sdl的load方法，获得数据集中对应图片的矩阵数据和标签数据。

接下data = data.reshape((data.shape[0], 3072))是来把读取到的图像矩阵”拉伸“为单行，并且在之后显示读取到的图像数据集的大小。

下面的le = LabelEncoder()和labels = le.fit_transfor(labels)没太看明白，作用是把标签解析出来，这是sklearn中的预定义函数，fit_transfor()函数内容如下：

def fit_transform(self, y):
        """Fit label encoder and return encoded labels

        Parameters
        ----------
        y : array-like of shape [n_samples]
            Target values.

        Returns
        -------
        y : array-like of shape [n_samples]
        """
        y = column_or_1d(y, warn=True)
        self.classes_, y = np.unique(y, return_inverse=True)
        return y

程序的最后通过train_test_split()函数分割训练数据和测试数据，这个函数也是sklearn中的预定义函数。

程序48 ~ 54行：训练k-NN分类器并进行评价

# train and evaluate a k-NN classifier on the raw pixel intensities
print("[INFO] evaluating k-NN classifier...")
model = KNeighborsClassifier(n_neighbors=args["neighbors"],
    n_jobs=args["jobs"])
model.fit(trainX, trainY)
print(classification_report(testY, model.predict(testX),
    target_names=le.classes_))

可以看出来这里使用sklearn包中预定义的k-NN类，程序中只是配置了一些参数。然后分别输出测试结果。

小结
可以看出，这个k-NN分类器需要我们自己编写的代码基本是数据集预处理的工作，比如：图像读取，图像预处理。之后真正的k-NN运算使用了sklearn中的各种预定义函数。训练出来的模型，模型测试结果信息等都有定制化程度很高的现成函数来实现了。

接下来的阅读过程可以逐步多了解sklearn中预定义的内容，同时也逐步扩展到其他框架的学习，比如：pytorch，caffe，tensorflow。基本的实现过程应该都比较类似，这些库已经实现了很多算法的核心内容，初始学习基本不会涉及算法和网络结构的硬核編程。

【笔记】PyImageResearch-DL4CV阅读笔记-5

猜你喜欢

热点阅读