01-IPython&Numpy

2018-12-11 本文已影响12人郑元吉

一.Anaconda

1.1 安装Anaconda

#Ubuntu 安装 Anaconda3 详细步骤
参考:https://blog.csdn.net/u012318074/article/details/77074665

1.2 修改下载源镜像

#添加“清华镜像”渠道
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/  
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/  
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/  
conda config --set show_channel_urls yes

二.为什么使用python进行数据分析

1.python大量的库为数据分析和处理提供了完整的工具集
2.比起R和Matlab等其他主要用于数据分析的编程语言，Python更全能

Python不仅提供数据处理平台，而且还有其他语言和专业应用所没有的应用。
可以用作脚本
可以操作数据库
可以开发web应用

3.Python库一直在增加，算法的实现采用更具创新性的方法
4.Python能和很多语言对接，例如高效的C语言

三.什么是Ipython

ipython是一个性能强大的python终端

1.ipython shell：功能强大的交互式shell
$ ipython
2.ipython notebook：集文本、代码、图像、公式的展现于一体的超级python web界面

从ipython4.0开始改名成Jupyter notebook

四.什么是Jupyter

Jupyter notebook：集文本、代码、图像、公式的展现于一体的超级python web界面

五.Ipython

5.1 启动：

ipython notebook/jupyter notebook

5.2 Ipython帮助文档

1.Help(参数)方法:help(len)
2.
?
??可以显示源码
例如：
        chr?
        /////////////////////////////////////////////////////////////
        L=[1,2,3]
        L?
        /////////////////////////////////////////////////////////////
        def myFunc(i):
            """
            help test
            """
            return i
        myFunc(10)
        myFun??
3.tab自动补全

5.3 Ipython魔法指令

1.运行外部Python文件：%run a.py（当前路径）
    运行其他路径：%run /home/nanfengpo/Desktop/bb.py
    尤其要注意的是，当我们使用魔法命令执行了一个外部文件时
    该文件的函数就能在当前会话中使用
2.运行计时
    用下面命令计算statement的运行时间：
    %time statement
    用下面命令计算statement的平均运行时间：
    %timeit statement
    可以使用两个百分号来测试多行代码的平均运行时间：
    %%timeit
    statement1
    statement2
    statement3

    %time一般用于耗时长的代码段
    %timeit一般用于耗时短的代码段
3.查看当前会话中的所有变量与函数
    快速查看当前会话的所有变量与函数名称：
    %who
    查看当前会话的所有变量与函数名称的详细信息：
    %whos
    返回一个字符串列表，里面元素是当前会话的所有变量与函数名称：
    %who_ls
4.执行Linux指令使用！
    在Linux指令之前加上感叹号，即可在ipython当中执行Linux指令。
    注意会将标准输出以字符串形式返回

5.4 Jupyter notebook快捷键

• Enter : 转入编辑模式 
• Shift-Enter : 运行本单元，选中下个单元
• Y : 单元转入代码状态
• M :单元转入markdown状态
• A : 在上方插入新单元
• B : 在下方插入新单元
• Double-D：删除一行
• Ctrl-A : 全选
• Ctrl-Z : 复原
• Shift-Enter : 运行本单元，选中下一单元
• Ctrl-Enter : 运行本单元
• Alt-Enter : 运行本单元，在下面插入一单元

六.Numpy

6.1 什么是Numpy：Numeric Python

NumPy系统是Python的一种开源的数值计算扩展

一个强大的N维数组对象Array
比较成熟的（广播）函数库
用于整合C/C++和Fortran代码的工具包
实用的线性代数、傅里叶变换和随机数生成函数
numpy和稀疏矩阵运算包scipy配合使用更加强大

6.2 导入

import numpy as np

查看版本
np.__version__

6.3 创建ndarray

6.3.1 使用np.array()

#一维
import numpy as np
test = np.array([1,2,3,4,5])
test
////////////////////////////////
#多维
test = np.array([[1,2,3],[4,5,6]])
test

6.3.2 使用np的routines函数创建

1) np.ones(shape, dtype=None, order='C')

np.ones([3,3])
输出结果：
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

np.ones([3,3],dtype=int)
输出结果：
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

2) np.zeros(shape, dtype=float, order='C')

n3 = np.zeros((4,5))
n3
输出结果：
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

3) np.full(shape, fill_value, dtype=None, order='C')

np.full([3,3],3.14)
输出结果：
array([[ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14]])

4) np.eye(N, M=None, k=0, dtype=float)

np.eye(4)
输出结果：
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])


np.eye(4，4，1)
输出结果：
array([[ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.]])


np.eye(4，4，-1)
输出结果：
array([[ 0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.]])

5) np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

等差数列
np.linspace(0,10,5)
输出结果：
array([  0. ,   2.5,   5. ,   7.5,  10. ])

6) np.arange([start, ]stop, [step, ]dtype=None)

等差数列
np.arange(0,10,2)
输出结果：
array([0, 2, 4, 6, 8])

7) np.random.randint(low, high=None, size=None, dtype='l')

np.random.randint(0,10,5)
输出结果：
array([0, 7, 2, 3, 7])

8) np.random.randn(d0, d1, ..., dn)

np.random.randn(10)
# 每次都不一样
输出结果：
array([-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604,  0.98132079,
        0.51421884,  0.22117967, -1.07004333, -0.18949583,  0.25500144])


//////////////////////////////////////////////////
np.random.seed(100)#随机的种子，有了种子，每次都一样
np.random.randn(10)

输出结果：
array([ 0.37332715, -0.2887605 ,  0.04985088, -0.93815832, -0.4087037 ,
        1.13352254,  0.52713526, -0.76014192, -0.97292788,  0.16290446])

9) np.random.random(size=None)

np.random.random(100)
# 每次每次都不一样
输出结果：
array([ 0.01150584,  0.52951883,  0.07689008,  0.72856545,  0.26700953,
        0.38506149,  0.56252666,  0.59974406,  0.38050248,  0.14719008,
        0.6360734 ,  0.27812695,  0.73241298,  0.10904588,  0.57071762,
        0.56808218,  0.33192772,  0.61444518,  0.07289501,  0.86464595,
        0.71140253,  0.3221285 ,  0.92556313,  0.26511829,  0.8487166 ,
        0.38634413,  0.32169243,  0.80473196,  0.92050868,  0.17325157,
        0.63503329,  0.89463233,  0.02796505,  0.04396453,  0.20603116,
        0.77663591,  0.96595455,  0.77823865,  0.90867045,  0.39274922,
        0.89526325,  0.26002297,  0.38606984,  0.69176715,  0.3170825 ,
        0.86994578,  0.35648567,  0.19945661,  0.16109699,  0.58245076,
        0.20239367,  0.7099113 ,  0.41444565,  0.16725785,  0.01170234,
        0.79989105,  0.76490449,  0.25418521,  0.55082581,  0.29550998,
        0.02919009,  0.32737646,  0.29171893,  0.67664205,  0.24447834,
        0.49631976,  0.41136961,  0.82478264,  0.76439988,  0.78829201,
        0.24360075,  0.26151563,  0.51388418,  0.19823452,  0.44097815,
        0.53198973,  0.50187154,  0.72374522,  0.11090765,  0.63469357,
        0.69199977,  0.97093079,  0.35920669,  0.86493051,  0.01984456,
        0.32219702,  0.58608421,  0.26591245,  0.51851213,  0.7896492 ,
        0.04914308,  0.28711285,  0.36225247,  0.21299697,  0.99046025,
        0.11375325,  0.70964612,  0.06599185,  0.47323442,  0.62003386])

//////////////////////////////////////////////////////////////////////////////////////////////////
np.random.random([3,3])
输出结果：
array([[ 0.37590691,  0.15563239,  0.7754904 ],
       [ 0.40353019,  0.59708594,  0.57000741],
       [ 0.33286511,  0.15678606,  0.58814922]])

10) np.random.normal(loc=0.0, scale=1.0, size=None)

n = np.random.normal(175, scale=5.0, size=50)
n 
输出结果：
array([177.62703208, 176.50746247, 173.26956915, 162.29355083,
       172.05271936, 177.61948035, 172.52243162, 175.43294252,
       181.14225673, 175.21450574, 179.56055092, 170.883815  ,
       170.91435313, 176.25008762, 176.3347509 , 183.90347049,
       178.91856559, 168.84725605, 176.32881783, 172.77973728,
       173.12257339, 174.75054378, 166.60349541, 171.68263799,
       168.83419713, 174.25085091, 175.66113435, 174.12039025,
       177.22772738, 169.01523024, 175.57587527, 172.89083838,
       179.52153939, 173.70318334, 179.06473552, 176.50099117,
       175.83008746, 174.78059027, 175.58909128, 178.11274357,
       183.45771692, 172.43399789, 179.56800892, 182.14239994,
       176.43701867, 177.37866513, 179.55215095, 174.5389049 ,
       175.48698667, 168.73145269])

6.3.3 ndarray的属性

ndim：维度 
shape：形状（各维度的长度） 
size：总长度
dtype：元素类型

6.3.4 ndarray的基本操作

1) 索引

一维与列表完全一致 多维时同理
np.random.seed(1)
x = np.random.randint(10,size=[3,4,5])
print(x[2,0,0])
print(x)

5
[[[5 8 9 5 0]
  [0 1 7 6 9]
  [2 4 5 2 4]
  [2 4 7 7 9]]

 [[1 7 0 6 9]
  [9 7 6 9 1]
  [0 1 8 8 3]
  [9 8 7 3 6]]

 [[5 1 9 3 4]
  [8 1 4 0 3]
  [9 2 0 4 9]
  [2 7 7 9 8]]]

2) 切片

一维与列表完全一致 多维时同理
np.random.seed(0)
x = np.random.randint(100,size = (10,4))
x

输出结果：
array([[44, 47, 64, 67],
       [67,  9, 83, 21],
       [36, 87, 70, 88],
       [88, 12, 58, 65],
       [39, 87, 46, 88],
       [81, 37, 25, 77],
       [72,  9, 20, 80],
       [69, 79, 47, 64],
       [82, 99, 88, 49],
       [29, 19, 19, 14]])

切片：

x[7:10]
切片结果：
array([[69, 79, 47, 64],
       [82, 99, 88, 49],
       [29, 19, 19, 14]])

3) 变形

使用reshape函数，注意参数是一个tuple！
x = np.arange(0,16).reshape(4,4)
x

执行结果：


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

类型是：
type(x.shape)

tuple

4) 级联

1.np.concatenate() 级联需要注意的点：
级联的参数是列表：一定要加中括号
维度必须相同
形状相符
【重点】级联的方向默认是shape这个tuple的第一个值所代表的维度方向
可通过axis参数改变级联的方向

x = np.array([1,2,3])
y = np.array([1,5,6,7,3,20])
x
输出 array([1, 2, 3])
输出 array([ 1,  5,  6,  7,  3, 20])

z = np.concatenate([x,y])
z
输出 array([ 1,  2,  3,  1,  5,  6,  7,  3, 20])
z.shape
输出 (9,)


/////////////////////////////////////////////////////
#二维
x = np.array([[1,2,3],[4,5,6]])
x

array([[1, 2, 3],
       [4, 5, 6]])

x.shape
(2,3)

p = np.concatenate([x,x]).shape
p
(4, 3)

2.axis

import numpy as np
x = np.array([[[1,2,3],[2,2,3],[3,3,3]],[[4,4,4],[5,5,5],[6,6,6]]])
print(x)
print(x.shape)

输出：
[[[1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]]]
(2, 3, 3)

////////////////////////////////
w = np.concatenate([x,x],axis = 0)
print(w.shape)
print(w)
输出：
(4, 3, 3)
[[[1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]]

 [[1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]]]
////////////////////////////////////
w = np.concatenate([x,x],axis = 1)
print(w.shape)
print(w)
输出：
(2, 6, 3)
[[[1 2 3]
  [2 2 3]
  [3 3 3]
  [1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]
  [4 4 4]
  [5 5 5]
  [6 6 6]]]
/////////////////////////////////////
w = np.concatenate([x,x],axis = 2)
print(w.shape)
print(w)

输出：
(2, 3, 6)
[[[1 2 3 1 2 3]
  [2 2 3 2 2 3]
  [3 3 3 3 3 3]]

 [[4 4 4 4 4 4]
  [5 5 5 5 5 5]
  [6 6 6 6 6 6]]]

3.np.hstack与np.vstack 水平级联与垂直级联

x = np.array([[1,1],[2,2],[3,3]])
y = np.array([1,2,3])
print(np.hstack(x))
print(np.vstack(y))

输出：
[1 1 2 2 3 3]
[[1]
 [2]
 [3]]

5) 切分

1.np.split()

x = np.arange(1,10)
x

输出：
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

x1,x2,x3 = np.split(x,[3,5])
print(x1,x2,x3)
输出：
[1 2 3] [4 5] [6 7 8 9]

2.np.hsplit()

x = np.arange(16).reshape(4,4)
x
输出：
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

print(np.hsplit(x,[2,3]))
输出：
[array([[ 0,  1],[ 4,  5], [ 8,  9], [12, 13]]),
         array([[ 2],  [ 6],  [10], [14]]), 
         array([[ 3],  [ 7],  [11],  [15]])]

3.np.vsplit()

x = np.arange(16).reshape(4,4)
x
输出：
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

print(np.vsplit(x,[2,3]))
输出：
[array([[0, 1, 2, 3],
        [4, 5, 6, 7]]), array([[ 8,  9, 10, 11]]), array([[12, 13, 14, 15]])]

6) 副本

1.所有赋值运算不会为ndarray的任何元素创建副本。
对赋值后的对象的操作也对原来的对象生效

a = np.array([1,2,3])
b=a
print(a,b)

输出：
[1 2 3] [1 2 3]


b[0]=2
a
输出：
array([2, 2, 3])

2.可使用copy()函数创建副本

a = np.array([1,2,3])
b = a.copy()
b
输出：
[1,2,3]

b[0]=3
print(a,b)
输出：
[1 2 3] [3 2 3]

6）ndarray的聚合操作

1.求和np.sum

import numpy as np
np.random.seed(0)
a = np.random.randint(1000,size = 100)
print(np.sum(a))

b = np.random.randint(1000,size = (3,4,5))
print(np.sum(b))

输出：
52397
32865

计算求和时间
%timeit np.sum(a)

输出：
2.63 µs ± 34.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

l = [1,2,3]
np.sum(l)
输出：

2.最大最小值：np.max/ np.min

%time max(a)
%time np.max(a)
print(max(a))
print(np.max(a))
print(min(a))
print(np.min(a))

输出：
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 22.4 µs
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 61.3 µs
999
999
9
9

3.其他聚合操作

Function Name    NaN-safe Version    Description
np.sum    np.nansum    Compute sum of elements
np.prod    np.nanprod    Compute product of elements
np.mean    np.nanmean    Compute mean of elements
np.std    np.nanstd    Compute standard deviation
np.var    np.nanvar    Compute variance
np.min    np.nanmin    Find minimum value
np.max    np.nanmax    Find maximum value
np.argmin    np.nanargmin    Find index of minimum value
np.argmax    np.nanargmax    Find index of maximum value
np.median    np.nanmedian    Compute median of elements
np.percentile    np.nanpercentile    Compute rank-based statistics of elements
np.any    N/A    Evaluate whether any elements are true
np.all    N/A    Evaluate whether all elements are true

////////////////////////////////////////////////////
a = np.array([1,2,3,4,np.nan])
print(np.sum(a))
print(np.nansum(a))

输出：
nan
10.0

4.操作文件

import pandas as pd
data = pd.read_csv('../../data/president_heights.csv')
heights = np.array(data['height(cm)'])
heights

输出：
array([189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175,
       178, 183, 193, 178, 173, 174, 183, 183, 168, 170, 178, 182, 180,
       183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188,
       188, 182, 185])

1.求平均值
    np.mean(heights)
2.求最大值
    np.max(heights)
3.求最小值
    np.min(heights)
4.计算标准差
    heights.std()

7）ndarray的矩阵操作

1.基本矩阵操作
算术运算符：
加减乘除

a = np.array([[1,2,3],
[4,5,6]])
a
输出：
array([[1, 2, 3],
       [4, 5, 6]])
a+1
输出：
array([[2, 3, 4],
       [5, 6, 7]])
a*2
输出：
array([[ 2,  4,  6],
       [ 8, 10, 12]])


a+[[1,4,9],[3,3,3]]
输出：
array([[ 2,  6, 12],
       [ 7,  8,  9]])


a*2-2
输出：
array([[ 0,  2,  4],
       [ 6,  8, 10]])

矩阵积

a = np.array([[1,2,3],
[4,5,6]])
a
输出：
array([[1, 2, 3],
       [4, 5, 6]])

b = np.array([[1,1],
[1,1],
[1,1]])
b

输出：
array([[1, 1],
       [1, 1],
       [1, 1]])

np.dot(a,b)
输出：
array([[ 6,  6],
       [15, 15]])

2.广播机制
【重要】ndarray广播机制的两条规则
规则一：为缺失的维度补1
规则二：假定缺失元素用已有值填充

例1： m = np.ones((2, 3)) a = np.arange(3) 求m+a

m = np.ones((2,3))
m
输出：
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

a = np.arange(3)
a
输出：
array([0, 1, 2])

m+a
输出：
array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

例2： a = np.arange(3).reshape((3, 1)) b = np.arange(3) 求a+b

a = np.arange(3).reshape((3,1))
a
输出：
array([[0],
       [1],
       [2]])

b = np.arange(3)
b
输出：
array([0, 1, 2])

a+b
输出：
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

8）ndarray的排序

1.冒泡排序

a = [3,1,5,2,4]
for i in range(len(a)-1):
    for j in range(i, len(a)):
        if a[i]>a[j]:
            a[i],a[j]=a[j],a[i]
a

输出：
[1, 2, 3, 5, 4]

使用以上所学numpy的知识，对一个ndarray对象进行选择排序

import numpy as np
a = np.array([33,2,67,9,88,22,34])
def selectSort(x):
    for i in range(len(x)):
        swap = np.argmin(x[i:])+i
        [x[i],x[swap]] = [x[swap],x[i]]
selectSort(a)
a

输出：
array([ 2,  9, 22, 33, 34, 67, 88])

2.快速排序

np.sort()与ndarray.sort()都可以，但有区别：
np.sort()不改变输入
ndarray.sort()本地处理，不占用空间，但改变输入

np.sort()不改变输入
    np.random.seed(10)
    a = np.random.randint(0,100,10)
    b = np.random.randint(0,100,10)
    print(a,b)
    print(np.sort(a),a)

    输入：
    [ 9 15 64 28 89 93 29  8 73  0] [40 36 16 11 54 88 62 33 72 78]
    [ 0  8  9 15 28 29 64 73 89 93] [ 9 15 64 28 89 93 29  8 73  0]

ndarray.sort()本地处理，不占用空间，但改变输入
    np.random.seed(20)
    a = np.random.randint(0,100,10)
    b = np.random.randint(0,100,10)
    print(a,b)
    print(a.sort(),a)

    输出：
    [99 90 15 95 28 90  9 20 75 22] [71 34 96 40 85 90 26 83 16 62]
    None [ 9 15 20 22 28 75 90 90 95 99]

3.部分排序

np.partition(a,k)
有的时候我们不是对全部数据感兴趣，我们可能只对最小或最大的一部分感兴趣。
当k为正时，我们想要得到最小的k个数
当k为负时，我们想要得到最大的k个数

k为正
    import numpy as np
    a = np.random.randint(0,100,10)
    print(a)
    print(np.partition(a,3))

    输出：
    [67  0 63 42 30 82 28 63 95 13]
    [ 0 13 28 30 42 82 63 63 95 67]

k为负
    b = np.random.randint(0,100,10)
    print(b)
    print(np.partition(b,-3))

    输出：
    [89 66 11 58 97  7 50 13 87 77]
    [ 7 13 11 50 58 66 77 87 89 97]

4.排序算法汇总

https://www.cnblogs.com/onepixel/articles/7674659.html