使用python机器学习(二)
2017-07-10 本文已影响634人
jacksu在简书
上一篇文章《使用python机器学习(一)》介绍过numpy的简单使用,下面介绍scipy,scipy基于numpy。
scipy方便、易于使用、专为科学和工程设计的Python工具包。它包括统计,优化,整合,线性代数模块,傅里叶变换,信号和图像处理,常微分方程求解器等等。
scipy包含的主要模块如下:
Vector quantization / Kmeans: scipy.cluster
Physical and mathematical constants: scipy.constants
Fourier transform: scipy.fftpack
Integration routines: scipy.integrate
Interpolation: scipy.interpolate
Data input and output: scipy.io
Linear algebra routines: scipy.linalg
n-dimensional image package: scipy.ndimage
Orthogonal distance regression: scipy.odr
Optimization: scipy.optimize
Signal processing: scipy.signal
Sparse matrices: scipy.sparse
Spatial data structures and algorithms: scipy.spatial
Any special mathematical functions: scipy.special
Statistics: scipy.stats
常用函数示例:
import numpy as np
from scipy import linalg
arr = np.array([[1, 2],[3, 4]])
##矩阵行列式
print("矩阵行列式:",linalg.det(arr))
print("矩阵的逆:",linalg.inv(arr))
矩阵行列式: -2.0
矩阵的逆: [[-2. 1. ]
[ 1.5 -0.5]]
#奇异值分解
arr = np.arange(9).reshape((3, 3)) + np.diag([1, 0, 1])
uarr, spec, vharr = linalg.svd(arr)
print(spec)
sarr = np.diag(spec)
svd_mat = uarr.dot(sarr).dot(vharr)
print(svd_mat)
np.allclose(arr,svd_mat)
[ 14.88982544 0.45294236 0.29654967]
[[ 1. 1. 2.]
[ 3. 4. 5.]
[ 6. 7. 9.]]
True
##傅里叶变换
##优化
from scipy import optimize
def f(x):
return x**2 + 10*np.sin(x)
import matplotlib.pyplot as plt
x = np.arange(-10, 10, 0.1)
plt.plot(x, f(x))
plt.show()
##bfgs依赖于初始点,有可能得到局部最小
optimize.fmin_bfgs(f, 0)
array([ 3.83746709])
optimize.fmin_bfgs(f, 3)
Optimization terminated successfully.
Current function value: 8.315586
Iterations: 6
Function evaluations: 21
Gradient evaluations: 7
array([ 3.83746709])
##全局最优
optimize.basinhopping(f, 0)
计算函数的根
1 只求的一个
root = optimize.fsolve(f, 1)
root
array([ 0.])
##曲线拟合
xdata = np.linspace(-10, 10, num=20)
ydata = f(xdata) + np.random.randn(xdata.size)
#假设满足函数f2,然后求a、b
def f2(x, a, b):
return a*x**2 + b*np.sin(x)
guess = [2, 2]
params, params_covariance = optimize.curve_fit(f2, xdata, ydata, guess)
params
array([ 1.00348624, 10.37354547])
#统计
a = np.random.normal(size=1000)
bins = np.arange(-4, 5)
print(bins)
histogram = np.histogram(a, bins=bins, normed=True)[0]
print(histogram)
bins = 0.5*(bins[1:] + bins[:-1])
print(bins)
from scipy import stats
#pdf概率密度函数probability density function
b = stats.norm.pdf(bins)
print("pdf:",b)
plt.plot(bins, histogram)
plt.plot(bins, b)
plt.show()
loc, std = stats.norm.fit(a)
print("loc:"+str(loc)+"std:"+str(std))
#中位数
np.median(a)
[-4 -3 -2 -1 0 1 2 3 4]
[ 0.001 0.025 0.137 0.339 0.34 0.136 0.02 0.002]
[-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5]
pdf: [ 0.00087268 0.0175283 0.1295176 0.35206533 0.35206533 0.1295176
0.0175283 0.00087268]
loc:-0.00549513299797std:1.00725628853
-0.0037246310284498475