2020 时序分析(5)
2020-06-08 本文已影响0人
zidea
machine_learning.jpg
深度学习遍布与机器学习各个领域,特别是处理时间序列和空间问题,分别有循环神经网和卷积神经网络两大主力。既然循环神经网络可以用来处理时间序列问题,我们尝试用循环神经网来处理时间序列预测问题
import math
import numpy as np
import pandas_datareader as web
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense,LSTM
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
加载数据
使用 web.DataReader 加载苹果的股票数据,可以指定要分析时间段。
df = web.DataReader("AAPL",data_source='yahoo',start='2012-01-01',end='2019,12,12')
df.head()
截屏2020-06-08上午6.21.22.png
这里数据共有 5 列,个人对股票甚少,我们今天只研究收盘价格 Close 列数据
df.shape
(1991, 6)
查看数据结构一共 1991 条记录,每条记录有 6 特征。
# visiual the closing price
plt.figure(figsize=(16,8))
plt.title("Close Price History")
plt.plot(df['Close'])
plt.xlabel('Date',fontsize=18)
plt.ylabel('Close price USD($)',fontsize=18)
plt.show()
output_5_0.png
我们使用图表形式将数据展示出来便于大家了解在研究时间段数据走势和分别情况,让我们对数据有一个大概了解。
过滤数据仅保留收盘数据
data = df.filter(['Close'])
dataset = data.values
准备训练数据
将数据集和验证集进行分离
training_data_len = math.ceil(len(dataset) * .8)
对数据进行缩放,将数据缩放到 0 到 1 之前,这样便于计算
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
scaled_data
array([[0.01394549],
[0.01543437],
[0.01852662],
...,
[0.98325872],
[1. ],
[0.99721765]])
将数据集进行拆分为样本和标签,样本和标签是按时间序列进行分开的。
train_data = scaled_data[0:training_data_len,:]
x_train = []
y_train = []
for i in range(60,len(train_data)):
x_train.append(train_data[i-60:i,0])
y_train.append(train_data[i,0])
print(x_train[0],x_train[1])
[0.01394549 0.01543437 0.01852662 0.02147067 0.0210193 0.02203657
0.02157172 0.02079024 0.01972581 0.02302017 0.02599118 0.02507495
0.02005592 0.02484589 0.02013002 0.03781453 0.03644692 0.03823223
0.04209249 0.04443021 0.04423484 0.04351399 0.04658604 0.04947618
0.05275037 0.05803888 0.06914811 0.06931653 0.07550107 0.0801226
0.07217972 0.07523832 0.07517769 0.08375384 0.08253444 0.0847913
0.08884695 0.09110386 0.09760502 0.10234111 0.10370868 0.10418702
0.09608921 0.09413547 0.09442517 0.10203791 0.10418029 0.1087816
0.11962812 0.13409911 0.13139084 0.13139757 0.14186008 0.14513422
0.14280324 0.14067438 0.13845792 0.14582139 0.15087413 0.15298953] [0.01543437 0.01852662 0.02147067 0.0210193 0.02203657 0.02157172
0.02079024 0.01972581 0.02302017 0.02599118 0.02507495 0.02005592
0.02484589 0.02013002 0.03781453 0.03644692 0.03823223 0.04209249
0.04443021 0.04423484 0.04351399 0.04658604 0.04947618 0.05275037
0.05803888 0.06914811 0.06931653 0.07550107 0.0801226 0.07217972
0.07523832 0.07517769 0.08375384 0.08253444 0.0847913 0.08884695
0.09110386 0.09760502 0.10234111 0.10370868 0.10418702 0.09608921
0.09413547 0.09442517 0.10203791 0.10418029 0.1087816 0.11962812
0.13409911 0.13139084 0.13139757 0.14186008 0.14513422 0.14280324
0.14067438 0.13845792 0.14582139 0.15087413 0.15298953 0.14776164]
print(y_train[0],y_train[1])
0.1477616406555458 0.14081585123770513
# convert the x_train and y_train to
x_train,y_train = np.array(x_train),np.array(y_train)
x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
x_train.shape
(1533, 60, 1)
构建模型
模型采用 Keras 提供 LSTM 作出层,2 层的 LSTM 随后是两层 Dense 做回归问题。Keras 给我们提供许多预选设计好层,我们只需直接调用即可无需繁琐配置。这样高级 API 有时候可能也会带来问题,让我们原理一些特定网络层是如何实现以及其原理
# build the LSTM model
model = Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(x_train.shape[1],1)))
model.add(LSTM(50,return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
设置优化器和目标函数
# compile model
model.compile(optimizer='adam',loss='mean_squared_error')
开始训练
# training model
model.fit(x_train,y_train,batch_size=1,epochs=1)
准备测试集以及测试数据
# testing the data set
test_data = scaled_data[training_data_len-60:,:]
x_test = []
y_test = dataset[training_data_len:,:]
for i in range(60,len(test_data)):
x_test.append(test_data[i-60:i,0])
# conver the data to a numpy array
x_test = np.array(x_test)
# reshape data
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))
# get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
计算均方差 rmse
# get the root mean square error(RMSE)
rmse = np.sqrt(np.mean(predictions - y_test)**2)
rmse
0.5314061917252277
可视化预测结果
接下里将模型预测值和真是值结合进行显示出来便于观察
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions']= predictions
# visualize
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Data',fontsize=18)
plt.ylabel('Close price USD($)',fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close','Predictions']])
plt.legend(['Train','Val','Predictions'],loc='lower right')
plt.show()
output_23_1.png