「python数据分析」(1.1) Ipython 和numpy

2018-09-01 本文已影响0人 guocx_

为什么用python进行数据分析

什么是Ipython

什么是Jupyter

Ipython

Numpy

1. 为什么用python进行数据分析

python大量的库为数据分析和处理提供了完整的工具集
比起R和Matlab等其他主要用于数据分析的编程语言，Python更全能
- Python不仅提供数据处理平台，而且还有其他语言和专业应用所没有的应用。
  可以用作脚本
  可以操作数据库
  可以开发web应用
Python库一直在增加，算法的实现采用更具创新性的方法
Python能和很多语言对接，例如高效的C语言

2. 什么是Ipython

ipython是一个性能强大的python终端
- ipython shell：功能强大的交互式shell $ ipython
- ipython notebook：集文本、代码、图像、公式的展现于一体的超级python web界面

3. 什么是Jupyter

Jupyter notebook：集文本、代码、图像、公式的展现于一体的超级python web界面

4. Ipython

熟悉掌握 Ipython魔法指令

列出所有魔法命令
lsmagic


Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

5. Numpy

什么是Numpy：Numeric Python
NumPy系统是Python的一种开源的数值计算扩展:
- 一个强大的N维数组对象Array
- 比较成熟的（广播）函数库
- 用于整合C/C++和Fortran代码的工具包
- 实用的线性代数、傅里叶变换和随机数生成函数
- numpy和稀疏矩阵运算包scipy配合使用更加强大
使用np.array()

#一维
import numpy as np
test = np.array([1,2,3,4,5])
test
////////////////////////////////
#多维
test = np.array([[1,2,3],[4,5,6]])
test

使用np的routines函数创建

np.ones(shape, dtype=None, order='C')
np.ones([3,3])
输出结果：
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

np.ones([3,3],dtype=int)
输出结果：
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

np.full(shape, fill_value, dtype=None, order='C')
np.full([3,3],3.14)
输出结果：
array([[ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14]])

np.eye(N, M=None, k=0, dtype=float)
np.eye(4)
输出结果：
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
等差数列
np.linspace(0,10,5)
输出结果：
array([  0. ,   2.5,   5. ,   7.5,  10. ])


np.arange([start, ]stop, [step, ]dtype=None)
等差数列
np.arange(0,10,2)
输出结果：
array([0, 2, 4, 6, 8])

np.random.randint(low, high=None, size=None, dtype='l')
np.random.randint(0,10,5)
输出结果：
array([0, 7, 2, 3, 7])

np.random.randn(d0, d1, ..., dn)
np.random.randn(10)
# 每次都不一样
输出结果：
array([-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604,  0.98132079,
        0.51421884,  0.22117967, -1.07004333, -0.18949583,  0.25500144])


//////////////////////////////////////////////////
np.random.seed(100)#随机的种子，有了种子，每次都一样
np.random.randn(10)

输出结果：
array([ 0.37332715, -0.2887605 ,  0.04985088, -0.93815832, -0.4087037 ,
        1.13352254,  0.52713526, -0.76014192, -0.97292788,  0.16290446])


np.random.random(size=None)
np.random.random(100)
# 每次每次都不一样
输出结果：
array([ 0.01150584,  0.52951883,  0.07689008,  0.72856545,  0.26700953,
        0.38506149,  0.56252666,  0.59974406,  0.38050248,  0.14719008,
        0.6360734 ,  0.27812695,  0.73241298,  0.10904588,  0.57071762,
        0.56808218,  0.33192772,  0.61444518,  0.07289501,  0.86464595,
        0.71140253,  0.3221285 ,  0.92556313,  0.26511829,  0.8487166 ,
        0.38634413,  0.32169243,  0.80473196,  0.92050868,  0.17325157,
        0.63503329,  0.89463233,  0.02796505,  0.04396453,  0.20603116,
        0.77663591,  0.96595455,  0.77823865,  0.90867045,  0.39274922,
        0.89526325,  0.26002297,  0.38606984,  0.69176715,  0.3170825 ,
        0.86994578,  0.35648567,  0.19945661,  0.16109699,  0.58245076,
        0.20239367,  0.7099113 ,  0.41444565,  0.16725785,  0.01170234,
        0.79989105,  0.76490449,  0.25418521,  0.55082581,  0.29550998,
        0.02919009,  0.32737646,  0.29171893,  0.67664205,  0.24447834,
        0.49631976,  0.41136961,  0.82478264,  0.76439988,  0.78829201,
        0.24360075,  0.26151563,  0.51388418,  0.19823452,  0.44097815,
        0.53198973,  0.50187154,  0.72374522,  0.11090765,  0.63469357,
        0.69199977,  0.97093079,  0.35920669,  0.86493051,  0.01984456,
        0.32219702,  0.58608421,  0.26591245,  0.51851213,  0.7896492 ,
        0.04914308,  0.28711285,  0.36225247,  0.21299697,  0.99046025,
        0.11375325,  0.70964612,  0.06599185,  0.47323442,  0.62003386])

//////////////////////////////////////////////////////////////////////////////////////////////////
np.random.random([3,3])
输出结果：
array([[ 0.37590691,  0.15563239,  0.7754904 ],
       [ 0.40353019,  0.59708594,  0.57000741],
       [ 0.33286511,  0.15678606,  0.58814922]])

ndarray的属性
ndim：维度
shape：形状（各维度的长度）
size：总长度
dtype：元素类型

np.random.seed(0)
x = np.random.randint(10,size=(3,4,5))
x

array([[[5, 0, 3, 3, 7],
        [9, 3, 5, 2, 4],
        [7, 6, 8, 8, 1],
        [6, 7, 7, 8, 1]],

       [[5, 9, 8, 9, 4],
        [3, 0, 3, 5, 0],
        [2, 3, 8, 1, 3],
        [3, 3, 7, 0, 1]],

       [[9, 9, 0, 4, 7],
        [3, 2, 7, 2, 0],
        [0, 4, 5, 5, 6],
        [8, 4, 1, 4, 9]]])

维度：x.ndim
形状，各维度的长度：x.shape
总长度：x.size
元素类型：x.dtype

ndarray的基本操作
索引

一维与列表完全一致 多维时同理
np.random.seed(1)
x = np.random.randint(10,size=[3,4,5])
print(x[2,0,0])
print(x)
输出结果为 :5
[[[5 8 9 5 0]
  [0 1 7 6 9]
  [2 4 5 2 4]
  [2 4 7 7 9]]

 [[1 7 0 6 9]
  [9 7 6 9 1]
  [0 1 8 8 3]
  [9 8 7 3 6]]

 [[5 1 9 3 4]
  [8 1 4 0 3]
  [9 2 0 4 9]
  [2 7 7 9 8]]]

切片

一维与列表完全一致 多维时同理
np.random.seed(0)
x = np.random.randint(100,size = (10,4))
x

输出结果：
array([[44, 47, 64, 67],
       [67,  9, 83, 21],
       [36, 87, 70, 88],
       [88, 12, 58, 65],
       [39, 87, 46, 88],
       [81, 37, 25, 77],
       [72,  9, 20, 80],
       [69, 79, 47, 64],
       [82, 99, 88, 49],
       [29, 19, 19, 14]])

切片：

x[7:10]
切片结果：
array([[69, 79, 47, 64],
       [82, 99, 88, 49],
       [29, 19, 19, 14]])

变形

使用reshape函数，注意参数是一个tuple！
x = np.arange(0,16).reshape(4,4)
x

执行结果：


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

类型是：
type(x.shape)

tuple

级联
np.concatenate() 级联需要注意的点：
级联的参数是列表：一定要加中括号
维度必须相同
形状相符
【重点】级联的方向默认是shape这个tuple的第一个值所代表的维度方向
可通过axis参数改变级联的方向

x = np.array([1,2,3])
y = np.array([1,5,6,7,3,20])
x
输出 array([1, 2, 3])
输出 array([ 1,  5,  6,  7,  3, 20])

z = np.concatenate([x,y])
z
输出 array([ 1,  2,  3,  1,  5,  6,  7,  3, 20])
z.shape
输出 (9,)


/////////////////////////////////////////////////////
#二维
x = np.array([[1,2,3],[4,5,6]])
x

array([[1, 2, 3],
       [4, 5, 6]])

x.shape
(2,3)

p = np.concatenate([x,x]).shape
p
(4, 3)

axis

import numpy as np
x = np.array([[[1,2,3],[2,2,3],[3,3,3]],[[4,4,4],[5,5,5],[6,6,6]]])
print(x)
print(x.shape)

输出：
[[[1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]]]
(2, 3, 3)

////////////////////////////////
w = np.concatenate([x,x],axis = 0)
print(w.shape)
print(w)
输出：
(4, 3, 3)
[[[1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]]

 [[1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]]]
////////////////////////////////////
w = np.concatenate([x,x],axis = 1)
print(w.shape)
print(w)
输出：
(2, 6, 3)
[[[1 2 3]
  [2 2 3]
  [3 3 3]
  [1 2 3]
  [2 2 3]
  [3 3 3]]

 [[4 4 4]
  [5 5 5]
  [6 6 6]
  [4 4 4]
  [5 5 5]
  [6 6 6]]]
/////////////////////////////////////
w = np.concatenate([x,x],axis = 2)
print(w.shape)
print(w)

输出：
(2, 3, 6)
[[[1 2 3 1 2 3]
  [2 2 3 2 2 3]
  [3 3 3 3 3 3]]

 [[4 4 4 4 4 4]
  [5 5 5 5 5 5]
  [6 6 6 6 6 6]]]

np.hstack与np.vstack 水平级联与垂直级联

x = np.array([[1,1],[2,2],[3,3]])
y = np.array([1,2,3])
print(np.hstack(x))
print(np.vstack(y))

输出：
[1 1 2 2 3 3]
[[1]
 [2]
 [3]]

切分
np.split()

x = np.arange(1,10)
x

输出：
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

x1,x2,x3 = np.split(x,[3,5])
print(x1,x2,x3)
输出：
[1 2 3] [4 5] [6 7 8 9]

np.hsplit()

x = np.arange(16).reshape(4,4)
x
输出：
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

np.vsplit()

x = np.arange(16).reshape(4,4)
x
输出：
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

print(np.vsplit(x,[2,3]))
输出：
[array([[0, 1, 2, 3],
        [4, 5, 6, 7]]), array([[ 8,  9, 10, 11]]), array([[12, 13, 14, 15]])]

副本
所有赋值运算不会为ndarray的任何元素创建副本。
对赋值后的对象的操作也对原来的对象生效

a = np.array([1,2,3])
b=a
print(a,b)

输出：
[1 2 3] [1 2 3]

可使用copy()函数创建副本

a = np.array([1,2,3])
b = a.copy()
b
输出：
[1,2,3]

b[0]=3
print(a,b)
输出：
[1 2 3] [3 2 3]