MLA CH2 kNN code part

2020-03-08  本文已影响0人  Mandy今天也沉迷学习

Python实例

kNN.py:

from numpy import *
import matplotlib
import matplotlib.pyplot as plt
import operator


def createDataSet():
    group = array([[3, 104], [2, 100], [1, 81], [101, 10], [99, 5], [98, 2]])
    names = ["CaliMan", "HNRD", "BW", "KL", "RS3000", "AII"]
    labels = ['R', 'R', 'R', 'A', 'A', 'A', 'A']
    return group, labels, names


def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    # Euclidian distance calculation
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat ** 2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances ** 0.5
    sortedDistIndices = distances.argsort()  # 排序后的下标
    # print(sortedDistIndices)
    classCount = {}
    # voting
    for i in range(k):
        voteIlable = labels[sortedDistIndices[i]]
        classCount[voteIlable] = classCount.get(voteIlable, 0) + 1
    # sort the dictionary according to the second line
    sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]

def scatter(group, names):
    x=group[:, 0]
    y=group[:, 1]
    plt.scatter(x,y)
    for i in range(len(x)):
        plt.annotate(names[i], xy=(x[i], y[i]), xytext=(x[i]+1, y[i]+1))
    plt.show()

group, labels, names = createDataSet()
KNN = classify0([18, 90], group, labels, 3)
scatter(group, names)
print(KNN)

输出结果

result

如何测试分类器(classifier)?

  1. 将已知数据的答案对分类器进行隐藏,让分类器进行判断
  2. 得到分类器的 犯错率(error rate)=犯错的累计次数/总测试数

例子——>约会网站数据

Hellen出去约会的有三种人:她不喜欢的/ 她有一点喜欢的/ 她很喜欢的
周一到周五她愿意见有一点喜欢的人,周末更愿意见很喜欢的人
希望对未来的结果能够更加匹配

准备

数据包括:

手头无数据,详情过程请见P24-P

测试

例子——>手写识别系统

*binary image

上一篇 下一篇

猜你喜欢

热点阅读