Numpy-其他
axis理解
NumPy数组的维数称为轴(axes),轴的个数叫秩(rank),一维数组的秩为1,二维数组的秩为2。
Stackoverflow系列(1) -Python Pandas与Numpy中axis参数的二义性
取数组元素
x = np.array([2, 4, 0, 3, 5])
# 不包括倒数第一个
x[:-1]
[2,4,0,3]
x=np.array([[1,2,3],[4,5,6],[7,8,9]])
# 二维数组,逗号前后表示要取的行和列,:就是全部取,0:2就是取第0列和第1列,不包括第2列
print(x[:,0:2])
[[1 2]
[4 5]
[7 8]]
如果只取一列,下面这种形式就会变成一个一位数组,要加上一个[],才可以维持原有的二维数组的形式。
print(x[:,-1])
[3 6 9]
print(x[:,[-1]])
[[3]
[6]
[9]]
排序
默认是升序排序。
list1 = [[1,3,2], [3,5,4]]
array = numpy.array(list1)
array = sort(array, axis=1) #对第1维升序排序
#array = sort(array, axis=0) #对第0维
print(array)
[[1 2 3]
[3 4 5]]
降序排序的实现:
array = -sort(-array, axis=1) #降序
[[3 2 1]
[5 4 3]]
参考
运算、索引、切片
http://blog.csdn.net/liangzuojiayi/article/details/51534164
矩阵的各类乘法
dot product点积
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
只有两个值是普通数组的时候才可以是点积,如果是np.array,则dot会变成矩阵乘法。也就是
x1 = np.array([[1,2,3]])
x2 = np.array([[1,2,3]])
np.dot(x1,x2)
会报错
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
outer product外积
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED OUTER PRODUCT ###
outer = np.outer(x1,x2)
element-wise multipulation按位乘
mul = np.multiply(x1,x2)
general dot product矩阵乘法
W = np.random.rand(3,len(x1))
dot = np.dot(W,x1)
可以看出dot既可以用作点积,也可以执行矩阵乘法
Broadcasting
广播用以描述numpy中对两个形状不同的阵列进行数学计算的处理机制。较小的阵列“广播”到较大阵列相同的形状尺度上,使它们对等以可以进行数学计算。广播提供了一种向量化阵列的操作方式,因此Python不需要像C一样循环。广播操作不需要数据复制,通常执行效率非常高。然而,有时广播是个坏主意,可能会导致内存浪费以致计算减慢。
Numpy操作通常由成对的阵列完成,阵列间逐个元素对元素地执行。最简单的情形是两个阵列有一样的形状,例如:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2., 4., 6.])
Numpy的广播机制放宽了对阵列形状的限制。最简单的情形是一个阵列和一个尺度值相乘:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2., 4., 6.])
上面两种结果是一样的,我们可以认为尺度值b在计算时被延展得和a一样的形状。延展后的b的每一个元素都是原来尺度值的复制。延展的类比只是一种概念性的。实际上,Numpy并不需要真的复制这些尺度值,所以广播运算在内存和计算效率上尽量高效。
上面的第二个例子比第一个更高效,因为广播在乘法计算时动用更少的内存。
exp
broadcast运算
x = np.array([1,2,3])
np.exp(x)
sum
broadcast运算。
def softmax(x):
x_exp = np.exp(x)
x_sum = np.sum(x_exp, axis=1, keepdims=True)
s = x_exp/x_sum
matrix
array转matrix
s = np.array([5,5,0,0,0,5])
np.matrix(s)
加载数据
loadtxt
numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)[source]
参数
fname : file, str, or pathlib.Path
File, filename, or generator to read. If the filename extension is
.gz
or.bz2
, the file is first decompressed. Note that generators should return byte strings for Python 3k.
dtype : data-type, optional
Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.
comments : str or sequence, optional
The characters or list of characters used to indicate the start of a comment; default: ‘#’.
delimiter : str, optional
The string used to separate values. By default, this is any whitespace.
converters : dict, optional
A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string:
converters = {0: datestr2num}
. Converters can also be used to provide a default value for missing data (but see alsogenfromtxt
):converters = {3: lambda s: float(s.strip() or 0)}
. Default: None.
skiprows : int, optional
Skip the first skiprows lines; default: 0.
usecols : int or sequence, optional
Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.
New in version 1.11.0.
Also when a single column has to be read it is possible to use an integer instead of a tuple. E.g
usecols = 3
reads the fourth column the same way as usecols = (3,)` would.
unpack : bool, optional
If True, the returned array is transposed, so that arguments may be unpacked using
x, y, z = loadtxt(...)
. When used with a structured data-type, arrays are returned for each field. Default is False.
ndmin : int, optional
The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2.
New in version 1.6.0.
返回
out : ndarray
Data read from the text file.
genfromtxt
import numpy
nfl = numpy.genfromtxt("data.csv", delimiter=",")
# U75就是将每个值作为一个75 byte的unicode来读取
world_alcohol = np.genfromtxt('world_alcohol.csv', dtype='U75', skip_header=1, delimiter=',')
data = np.genfromtxt('/Users/david/david/code/00project/carthage/scripts/adult.data', delimiter=', ', dtype=str)
# 取第14列
labels = data[:,14]
# 取除了倒数第二列之外的所有列
data = data[:,:-1]
matrix转数组
np.argsort(y_score, kind="mergesort")[::-1]
随机数字的矩阵
import numpy as np
numpy_matrix = np.random.randint(10, size=[5,2])
‘’‘
array([[1, 0],
[8, 4],
[0, 5],
[2, 9],
[9, 9]])
’‘’
获取排序后数据位置的下标
import numpy as np
dd=np.mat([4,5,1])
dd1 = dd.argsort()
print dd
print dd1 #matrix([[2, 0, 1]], dtype=int64)
squeeze
从数组的形状中删除单维条目,即把shape中为1的维度去掉
x = np.array([[[0], [1], [2]]])
np.squeeze(x)
array([0, 1, 2])
如果本来就是(1,1)的矩阵,则变成常数
cost = np.array([[1]])
cost = np.squeeze(cost)
得到1,cost的shape变成()
获取符合条件的行列集合
数据如
1,1,1,0,0,0
0,1,1,1,1,0
1,0,0,1,1,0
0,0,0,1,1,0
第一列作为y_train,后面矩阵作为x_train,需要获取y_train中为1的x_train的行
pos_rows = (y_train == 1)
x_train[pos_rows,:]
还有个例子
vector = numpy.array([5, 10, 15, 20])
vector == 10
[False, True, False, False]
matrix = numpy.array([
[5, 10, 15],
[20, 25, 30],
[35, 40, 45]
])
matrix == 25
[
[False, False, False],
[False, True, False],
[False, False, False]
]
比如要找第二列中是25的那一行
matrix = np.array([
[5, 10, 15],
[20, 25, 30],
[35, 40, 45]
])
second_column_25 = (matrix[:,1] == 25)
# 等同于print(matrix[second_column_25])
print(matrix[second_column_25, :])
[
[20, 25, 30]
]
多个条件的比较
vector = numpy.array([5, 10, 15, 20])
equal_to_ten_and_five = (vector == 10) & (vector == 5)
[False, False, False, False]
vector = numpy.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
[True, True, False, False]
也可以根据比较的结果改变值
vector = numpy.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
vector[equal_to_ten_or_five] = 50
print(vector)
true的都变成了50
[50, 50, 15, 20]