#Python3组数据挖掘实战总结8、9、X章#

2018-01-28  本文已影响0人  DrBear_smile

关联

概念

关联规则

相当于P(x,y)

  • 相当于P(x,y)|P(x)
  • P(x,y)|P(x) != P(y,x)|P(y)

lift({A->B}) = confidence({A->B})/supper(B)

  • ? supper vs support

计算步骤

code:

pip install apyori

from apyori import apriori
apyori.apriori(transactions)
transactions = array([
[a,b],[a,b,c],
...],dtype=object)

协同过滤

常被用于分别某位特定顾客可能你感兴趣的东西,结论来自于对相似顾客对那些产品感兴趣的分析

用户或商品的相似度

sim(x,y)=\frac{1}{1+d(x,y)}
  • 固定数量的邻居(K-Neighborhoods)
  • 基于相似度门槛的邻居(Threshold-based neighborhoods), 类似于K-Means的计算方式

时间序列分析

时间序列是均匀时间间隔上的观测值序列

时间序列分解(Time-Series Decomposition)

非季节性时间序列分解

移动平均是一种简单平滑技术,它通过在时间序列上逐项推移取一定相数的均值,来表现指标的长期变化和发展趋势

{SMA}_n = \frac{x_1+x_2+...+x_n}{n}
{WMA}_n = w_1x_1 + w_2x_2 + ... + w_nx_n
pandas.rolling_mean(ts,step)
# ts: time series data
# step: moving step
pandas.rolling(ts).aggregate(function)
#ts: time series data
#function: function aggregation

季节性时间序列分解

时间序列中,n个时间间隔后,具有以n为周期的季节性特性

import statsmodels.api as sm
sm.tsa.seasonal_decompose(ts,freq)
# ts: data
# freq: cycle

序列预测

差分(integrated)

\Delta f(x_k) = f(x_{k+1})-f(x_k)
X_t = c + \sum^p_{i=1}{\varphi_i X_{t-1}}+\varepsilon_t
X_t = \mu + \sum^q_{i=1}{\theta_i \varepsilon_{t-i}}+\varepsilon_t
X_t = c + \sum^p_{i=1}\varphi_iX_{t-1}+
\sum^q_{i=1}{\theta_i \varepsilon_{t-i}}+\varepsilon_t
import statsmodels.api as sm
# 评估是否为平稳时间序列
sm.tsa.stattools.adfuller(ts)
# diff
diff_ts = ts.diff(d)
# d:差分的步数

sm.tsa.arma_order_select_ic(
diff_ts,
max_ar,#p
max_ma,#q
ic, # 参数最优评判指标,待选aic,bic,hqic
trend
)

armaModel = sm.tsa.ARMA(data,(p,q))
armaModel.fit()
armaModel.predict(start, end)
# start, end: 预测的开始和结束日期

# revert diff

模型持久化

Persistence:瞬时数据持久化

# based on sklearn
from sklearn.externals import joblib
sklearn.externals.joblib.dump(model, filePath)
model = sklearn.externals.joblib.load(filepath)
pandas.crosstab(x,X)
上一篇下一篇

猜你喜欢

热点阅读