numpy必知必会-第六天
26 把numpy array元素的指定列合成新的array
例如:
输入
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
把每行的第五列,取出并组成新的array
array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
b'Iris-setosa', b'Iris-setosa'], dtype='|S18')
解决方案:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
species = np.array([row[4] for row in iris_1d])
species[:10]
输出
array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
b'Iris-setosa', b'Iris-setosa'], dtype='|S18')
iris_1d里面存放的数据形式如下:
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
(4.9, 3. , 1.4, 0.2, 'Iris-setosa'),
(4.7, 3.2, 1.3, 0.2, 'Iris-setosa'),
(4.6, 3.1, 1.5, 0.2, 'Iris-setosa'),
(5. , 3.6, 1.4, 0.2, 'Iris-setosa'),
(5.4, 3.9, 1.7, 0.4, 'Iris-setosa'),
(4.6, 3.4, 1.4, 0.3, 'Iris-setosa'),
(5. , 3.4, 1.5, 0.2, 'Iris-setosa'),
(4.4, 2.9, 1.4, 0.2, 'Iris-setosa'),
(4.9, 3.1, 1.5, 0.1, 'Iris-setosa')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<U15')])
数据形式为1d,即只有一组[],直观的感觉就是[元素1, 元素2,... , 元素10],只是每个元素是一个元组。这个元组我们可以通过for来操作。
np.array([row[4] for row in iris_1d]) 最重要的:
[row[4] for row in iris_1d]这是一个列表解析,构建一个新的列表。
[row[4] for row in iris_1d]把每行中的第5个元素,构成新的列表
[b'Iris-setosa',
b'Iris-setosa',
b'Iris-setosa',
......
b'Iris-virginica',
b'Iris-virginica',
b'Iris-virginica']
27 把1D iris数据转变为2D iris数据
例如:
输入
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
输出
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2]])
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None, encoding=None)
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:4]
输出
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2]])
关键操作iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
[row.tolist()[:4] for row in iris_1d] 同样是一个列表解析,只是先把每一行通过tolist进行了一个转换处理,把元组转换为列表,然后在选出前四个元素,相当于组成这样的列表[[元素1,元素2,元素3,元素4],[元素1,元素2,元素3,元素4],[元素1,元素2,元素3,元素4],... ]然后再通过np.array,把list转换为array。
也可以通过iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
来实现同样的效果。
29 计算array的均值,中值,方差
例如:
输入
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
计算sepallength的均值,中值,方差
解决方法:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
print(mu, med, sd)
输出
5.843333333333334 5.8 0.8253012917851409
29 如何把array中的数据norm化
例如:
输入
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
把sepallength中的元素,norm化到0~1的范围内。
比如这样:
[0.222222 0.166667 0.111111 0.083333 0.194444 ]
解决方法:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
Smax, Smin = sepallength.max(), sepallength.min()
S = (sepallength - Smin)/(Smax - Smin)
print(S[:5])
输出
[0.222222 0.166667 0.111111 0.083333 0.194444 ]
也可以通过S = (sepallength - Smin)/sepallength.ptp()来实现类似操作。sepallength.ptp()相当于(Smax - Smin)
30 计算softmax score
例如:
输入
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
计算sepallength中每个元素的softmax score
解决方法:
softmax score的计算公式为:
假设我们有一个数组,V,Vi表示V中的第i个元素,那么这个元素的softmax值就是 softmax score
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
print(softmax(sepallength))
输出
[0.00222 0.001817 0.001488 0.001346 0.002008 0.002996 0.001346 0.002008
0.001102 0.001817 0.002996 0.001644 0.001644 0.000997 0.00447 0.004044
0.002996 0.00222 0.004044 0.00222 0.002996 0.00222 0.001346 0.00222
0.001644 0.002008 0.002008 0.002453 0.002453 0.001488 0.001644 0.002996
0.002453 0.003311 0.001817 0.002008 0.003311 0.001817 0.001102 0.00222
0.002008 0.001218 0.001102 0.002008 0.00222 0.001644 0.00222 0.001346
0.002711 0.002008 0.01484 0.008144 0.013428 0.003311 0.009001 0.004044
0.007369 0.001817 0.009947 0.002453 0.002008 0.00494 0.005459 0.006033
0.003659 0.010994 0.003659 0.00447 0.006668 0.003659 0.00494 0.006033
0.007369 0.006033 0.008144 0.009947 0.01215 0.010994 0.005459 0.004044
0.003311 0.003311 0.00447 0.005459 0.002996 0.005459 0.010994 0.007369
0.003659 0.003311 0.003311 0.006033 0.00447 0.002008 0.003659 0.004044
0.004044 0.006668 0.00222 0.004044 0.007369 0.00447 0.016401 0.007369
0.009001 0.02704 0.001817 0.020032 0.010994 0.018126 0.009001 0.008144
0.01215 0.004044 0.00447 0.008144 0.009001 0.029884 0.029884 0.005459
0.013428 0.003659 0.029884 0.007369 0.010994 0.018126 0.006668 0.006033
0.008144 0.018126 0.022139 0.0365 0.008144 0.007369 0.006033 0.029884
0.007369 0.008144 0.005459 0.013428 0.010994 0.013428 0.00447 0.01215
0.010994 0.010994 0.007369 0.009001 0.006668 0.00494 ]
核心部分:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
这个就是softmax score的公式构建。
关于softmax的扩展阅读