Numpy-其他

2019-02-13 本文已影响0人 davidic

axis理解

NumPy数组的维数称为轴（axes），轴的个数叫秩（rank），一维数组的秩为1，二维数组的秩为2。

Stackoverflow系列(1) -Python Pandas与Numpy中axis参数的二义性

取数组元素

x = np.array([2,  4,  0,  3,  5])
# 不包括倒数第一个
x[:-1]

[2,4,0,3]

x=np.array([[1,2,3],[4,5,6],[7,8,9]])
# 二维数组，逗号前后表示要取的行和列，:就是全部取，0:2就是取第0列和第1列，不包括第2列
print(x[:,0:2])

[[1 2]
 [4 5]
 [7 8]]

如果只取一列，下面这种形式就会变成一个一位数组，要加上一个[]，才可以维持原有的二维数组的形式。

print(x[:,-1])

[3 6 9]

print(x[:,[-1]])

[[3]
 [6]
 [9]]

排序

numpy教程：排序、搜索和计数

默认是升序排序。

list1 = [[1,3,2], [3,5,4]]
array = numpy.array(list1)
array = sort(array, axis=1)   #对第1维升序排序
#array = sort(array, axis=0)   #对第0维
print(array)
[[1 2 3]
 [3 4 5]]

降序排序的实现:

array = -sort(-array, axis=1)   #降序
[[3 2 1]
 [5 4 3]]

参考

【1】numpy中的ndarray方法和属性

运算、索引、切片

http://blog.csdn.net/liangzuojiayi/article/details/51534164

矩阵的各类乘法

dot product点积

$a \cdot b = a_1b_1+a_2b_2...a_nb_n$

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

只有两个值是普通数组的时候才可以是点积，如果是np.array，则dot会变成矩阵乘法。也就是

x1 = np.array([[1,2,3]])
x2 = np.array([[1,2,3]])
np.dot(x1,x2)

会报错

ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

outer product外积

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED OUTER PRODUCT ###
outer = np.outer(x1,x2)

element-wise multipulation按位乘

mul = np.multiply(x1,x2)

general dot product矩阵乘法

W = np.random.rand(3,len(x1))
dot = np.dot(W,x1)

可以看出dot既可以用作点积，也可以执行矩阵乘法

Broadcasting

广播用以描述numpy中对两个形状不同的阵列进行数学计算的处理机制。较小的阵列“广播”到较大阵列相同的形状尺度上，使它们对等以可以进行数学计算。广播提供了一种向量化阵列的操作方式，因此Python不需要像C一样循环。广播操作不需要数据复制，通常执行效率非常高。然而，有时广播是个坏主意，可能会导致内存浪费以致计算减慢。

Numpy操作通常由成对的阵列完成，阵列间逐个元素对元素地执行。最简单的情形是两个阵列有一样的形状，例如：

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2.,  4.,  6.])

Numpy的广播机制放宽了对阵列形状的限制。最简单的情形是一个阵列和一个尺度值相乘：

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2.,  4.,  6.])

上面两种结果是一样的，我们可以认为尺度值b在计算时被延展得和a一样的形状。延展后的b的每一个元素都是原来尺度值的复制。延展的类比只是一种概念性的。实际上，Numpy并不需要真的复制这些尺度值，所以广播运算在内存和计算效率上尽量高效。

上面的第二个例子比第一个更高效，因为广播在乘法计算时动用更少的内存。

exp

broadcast运算

x = np.array([1,2,3])
np.exp(x)

sum

broadcast运算。

def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis=1, keepdims=True)
    s = x_exp/x_sum

matrix

array转matrix

s = np.array([5,5,0,0,0,5])
np.matrix(s)

加载数据

loadtxt

numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)[source]

参数

fname : file, str, or pathlib.Path

File, filename, or generator to read. If the filename extension is .gz or .bz2, the file is first decompressed. Note that generators should return byte strings for Python 3k.

dtype : data-type, optional

Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.

comments : str or sequence, optional

The characters or list of characters used to indicate the start of a comment; default: ‘#’.

delimiter : str, optional

The string used to separate values. By default, this is any whitespace.

converters : dict, optional

A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string: converters = {0: datestr2num}. Converters can also be used to provide a default value for missing data (but see also genfromtxt):converters = {3: lambda s: float(s.strip() or 0)}. Default: None.

skiprows : int, optional

Skip the first skiprows lines; default: 0.

usecols : int or sequence, optional

Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.

New in version 1.11.0.

Also when a single column has to be read it is possible to use an integer instead of a tuple. E.g usecols = 3 reads the fourth column the same way as usecols = (3,)` would.

unpack : bool, optional

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...). When used with a structured data-type, arrays are returned for each field. Default is False.

ndmin : int, optional

The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2.

New in version 1.6.0.

out : ndarray

Data read from the text file.

genfromtxt

import numpy
nfl = numpy.genfromtxt("data.csv", delimiter=",")

# U75就是将每个值作为一个75 byte的unicode来读取
world_alcohol = np.genfromtxt('world_alcohol.csv', dtype='U75', skip_header=1, delimiter=',')

data = np.genfromtxt('/Users/david/david/code/00project/carthage/scripts/adult.data', delimiter=', ', dtype=str)
# 取第14列
labels = data[:,14]
# 取除了倒数第二列之外的所有列
data = data[:,:-1]

matrix转数组

np.argsort(y_score, kind="mergesort")[::-1]

随机数字的矩阵

import numpy as np
numpy_matrix = np.random.randint(10, size=[5,2])

‘’‘
array([[1, 0],
       [8, 4],
       [0, 5],
       [2, 9],
       [9, 9]])
’‘’

获取排序后数据位置的下标

import numpy as np
dd=np.mat([4,5,1]) 
dd1 = dd.argsort()
print dd
print dd1       #matrix([[2, 0, 1]], dtype=int64)

squeeze

从数组的形状中删除单维条目，即把shape中为1的维度去掉

x = np.array([[[0], [1], [2]]])  
np.squeeze(x)

array([0, 1, 2])

如果本来就是(1,1)的矩阵，则变成常数

cost = np.array([[1]])
cost = np.squeeze(cost)

得到1，cost的shape变成()

获取符合条件的行列集合

数据如

1,1,1,0,0,0
0,1,1,1,1,0
1,0,0,1,1,0
0,0,0,1,1,0

第一列作为y_train，后面矩阵作为x_train，需要获取y_train中为1的x_train的行

pos_rows = (y_train == 1)
x_train[pos_rows,:]

还有个例子

vector = numpy.array([5, 10, 15, 20])
vector == 10

[False, True, False, False]

matrix = numpy.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
    matrix == 25

[
    [False, False, False], 
    [False, True,  False],
    [False, False, False]
]

比如要找第二列中是25的那一行

matrix = np.array([
                [5, 10, 15], 
                [20, 25, 30],
                [35, 40, 45]
             ])
    second_column_25 = (matrix[:,1] == 25)
    # 等同于print(matrix[second_column_25])
    print(matrix[second_column_25, :])

[
    [20, 25, 30]
]

多个条件的比较

vector = numpy.array([5, 10, 15, 20])
equal_to_ten_and_five = (vector == 10) & (vector == 5)

[False, False, False, False]

vector = numpy.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)

[True, True, False, False]

也可以根据比较的结果改变值

vector = numpy.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
vector[equal_to_ten_or_five] = 50
print(vector)

true的都变成了50

[50, 50, 15, 20]