2018-06-14 learn lstm

2018-06-14 本文已影响0人九剑至尊

我自己的原始笔记

Best way to learn one thing is to read and rerun the code, and then write your own code.
-- me

Data

Here we show how to use lstm to predict time series.

The input data is a wave data like sin.

image.png

Data is here. https://raw.githubusercontent.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction/master/sinwave.csv
like this

0.841470985
0.873736397
0.90255357
0.927808777
0.949402346
0.967249058
0.98127848
0.991435244
0.997679266
0.999985904
0.998346054
0.992766189
0.983268329
0.969889958
0.952683874
0.931717983
0.907075026
0.878852258
0.847161063
0.812126509

The amplitude and frequency of this sin wave is 1 (giving an angular frequency of 6.28), and I used the function to get data points over 5001 time periods with a time delta of 0.01
频率是1 时间段是0.01 采集了5001
我检查了一下一个完整波形是100个时间段（2个最接近0的 0.0053之间是50个时间段）也就是这里的默认单位是 1 就是time
1个单位（1 time）就是一个完整波形

Target

We give lstm a set window size of these data and then we want LSTM to predict the next N-steps in the series.

input data

First we need process the data, making it suitable for lstm to use.
Here, we use keras lstm. For keras, the input format is a numpy array of 3 dimensions (N, W, F)

N , W ,F

N is the number of training sequences 训练序列的数量
W is the sequence length 每个序列的长度
F is the number of features of each sequence. 每个序列的特征

窗口或者说 1个序列长度是50 基本是半个波形
序列是每次移动一个格应该是0.01个时间所以会和之前的序列数据形成重叠

Here I chose to go with a sequence length (read window size) of 50 which allows for the network so get glimpses of the shape of the sin wave at each sequence and hence will hopefully teach itself to build up a pattern of the sequences based on the prior window received. The sequences themselves are sliding windows and hence shift by 1 each time, causing a constant overlap with the prior windows.

def load_data(filename, seq_len, normalise_window):#文件名 序列长度 窗口长度
    f = open(filename, 'rb').read()# read file binary
    data = f.decode().split('\n')# 按行分割

    sequence_length = seq_len + 1#这相当于序列长度和窗口的概念
    result = []
    for index in range(len(data) - sequence_length):#一共5001个数 减去一块 比如50 剩余4951 循环4951次 每次移动1个数 result中保存所有的序列 其实都是重叠的 重叠一个 比如 1 2 3 4 -10 一共10个数 窗口是3  第一个序列123 第二个234.。。这样～～

        result.append(data[index: index + sequence_length]) #一开始取数据的窗口是51 但实际上训练时只需要50个训练数据 所以真实窗口和神经元数据都是50.。。预测数据是1.。。 train是51列 N行 （总长度减去一个51） x——train是50列 N行
    result = np.array(result)#列表数据转换成矩阵

    row = round(0.9 * result.shape[0])#矩阵的shape是 行数 列数 shape[0]就是行数 行数乘以0.9 取整数 应该是train数据集？
    train = result[:int(row), :]#啊 果然是 矩阵从第一行取到一定行数 列数全要
    np.random.shuffle(train)#如果是一行 全部打乱 如果是多行 行顺序打乱 行内部不打乱
    x_train = train[:, :-1]#所有训练数据 除了最后一列 
    y_train = train[:, -1]#所有训练数据 的最后一列 作为预测
    x_test = result[int(row):, :-1]#测试shuju 
    y_test = result[int(row):, -1]

    x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))#重新整理成 对应的行和列？？ 不明白
    x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))  

    return [x_train, y_train, x_test, y_test]

行列的问题

x = np.array([1, 2, 3, 4])
x.shape 我以为这是单行
(4,)
y.shape = (3, 8) 这里确实是3行8列
y
array([[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.]])

lstm

Next up we need to actually build the network itself. This is the simple part! At least if you’re using Keras it’s as simple as stacking Lego bricks. I used a network structure of [1, 50, 100, 1] where we have 1 input layer (consisting of a sequence of size 50) which feeds into an LSTM layer with 50 neurons, that in turn feeds into another LSTM layer with 100 neurons which then feeds into a fully connected normal layer of 1 neuron with a linear activation function which will be used to give the prediction of the next time step.
lstm简单用keras的话就是搭积木我用了一个 [1, 50, 100, 1]的结构
1代表一个输入层
50代表一个序列或者窗口的数据长度对应50个神经元
然后再输入到另一个lstm层 100个神经元
然后是一个全链接输出层只有1个输出神经元
线性激活方程
预测下一步的时间
（但实际上前49个不能预测）

def build_model(layers):#建立模型
    model = Sequential()#模型

    model.add(LSTM(# 第一层 lstm 输入是层0 输出是层1 返回序列
        input_dim=layers[0],
        output_dim=layers[1],
        return_sequences=True))
    model.add(Dropout(0.2))#dropout 20%

    model.add(LSTM(# 第二层lstm
        layers[2],# 输出第三层
        return_sequences=False))#不返回序列
    model.add(Dropout(0.2))#dropout 20%

    model.add(Dense(#全连接
        output_dim=layers[3]))#输出是第4层
    model.add(Activation("linear"))#有一个线性激活器

    start = time.time()#
    model.compile(loss="mse", optimizer="rmsprop")#损失函数mse 优化器 rms
    print("> Compilation Time : ", time.time() - start)#编译
    return model#返回模型

Train

现在开始训练只需训练一次epoch LSTM和传统网络不一样你需要再很多训练样本上训练很多次 LSTM一次就会循环所有的序列窗口如果数据缺少结构的话需要训练很多次但这是一个很简单的波形很容易训练

（1）batchsize：批大小。在深度学习中，一般采用SGD训练，即每次训练在训练集中取batchsize个样本训练；

（2）iteration：1个iteration等于使用batchsize个样本训练一次；

（3）epoch：1个epoch等于使用训练集中的全部样本训练一次；
举个例子，训练集有1000个样本，batchsize=10，那么：

训练完整个样本集需要：

100次iteration，1次epoch。

我们将所有的运行数据放在一个单独的run中

epochs  = 1# 次数
seq_len = 50#序列长度

print('> Loading data... ')

X_train, y_train, X_test, y_test = lstm.load_data('sp500.csv', seq_len, True)#读入数据

print('> Data Loaded. Compiling...')

model = lstm.build_model([1, 50, 100, 1])#这里的lstm是本文自己写的程序 里面用的是 keras的lstm 标准包 

model.fit(
    X_train,# 
    y_train,#
    batch_size=512,#不知道什么意思 意思是有几千行 训练数据 每次训练多少行
    nb_epoch=epochs,#
    validation_split=0.05)#不知道什么意
#获得可以预测的model

predicted = lstm.predict_point_by_point(model, X_test)#专门写的预测程序 包装了标准predict

一个点一个点预测这里需要注意的是原始的测试数据其实是整个序列的后面一部分数据每一行都是50个数字预测的是第51个数字每行序列都是重叠的

def predict_point_by_point(model, data):#输入model和测试数据
    #Predict each timestep given the last sequence of true data, in effect only predicting 1 step ahead each time
    predicted = model.predict(data)#输入的数据 是 n行 50 列 
    predicted = np.reshape(predicted, (predicted.size,))#得到的数据 格式进行整理
    return predicted

def predict_sequence_full(model, data, window_size):#模型 数据 窗口大小 预测 全部的时间
    #Shift the window by 1 new prediction each time, re-run predictions on new window 每次移动一下窗口 然后 重新在新的数据上跑预测
    curr_frame = data[0]# 假设有8行数据 每行50个数  取第一行数据
    predicted = []# 存储预测数据的列表
    for i in range(len(data)):#循环8x50次 np.newaxis 相当于 None 可以进行行列转换 每次预测当前列对应的数据 
        predicted.append(model.predict(curr_frame[newaxis,:,:])[0,0])#
        curr_frame = curr_frame[1:]#新的当前窗口是老窗口移动一下 去掉最前面的一个数据 然后 在尾巴插入刚刚最新的预测数据 如此
        curr_frame = np.insert(curr_frame, [window_size-1], predicted[-1], axis=0)
    return predicted

def predict_sequences_multiple(model, data, window_size, prediction_len):#
    #Predict sequence of 50 steps before shifting prediction run forward by 50 steps 在移动预测向前50步之前 先预测50个时间
    prediction_seqs = []
    for i in range(int(len(data)/prediction_len)):
        curr_frame = data[i*prediction_len]
        predicted = []
        for j in range(prediction_len):
            predicted.append(model.predict(curr_frame[newaxis,:,:])[0,0])
            curr_frame = curr_frame[1:]
            curr_frame = np.insert(curr_frame, [window_size-1], predicted[-1], axis=0)
        prediction_seqs.append(predicted)
    return prediction_seqs

If you’re observant you’ll have noticed in our load_data() function above we split the data in to train/test sets as is standard practice for machine learning problems. However what we need to watch out for here is what we actually want to achieve in the prediction of the time series.

If we were to use the test set as it is, we would be running each window full of the true data to predict the next time step. This is fine if we are only looking to predict one time step ahead, however if we’re looking to predict more than one time step ahead, maybe looking to predict any emergent trends or functions (e.g. the sin function in this case) using the full test set would mean we would be predicting the next time step but then disregarding that prediction when it comes to subsequent time steps and using only the true data for each time step.

You can see below the graph of using this approach to predict only one time step ahead at each step in time:
'''
def plot_results(predicted_data, true_data):
fig = plt.figure(facecolor='white')
ax = fig.add_subplot(111)
ax.plot(true_data, label='True Data')
plt.plot(predicted_data, label='Prediction')
plt.legend()
plt.show()
'''
画图程序简单直接画真实和预测数据即可

'''
def plot_results_multiple(predicted_data, true_data, prediction_len):
fig = plt.figure(facecolor='white')
ax = fig.add_subplot(111)
ax.plot(true_data, label='True Data')
#Pad the list of predictions to shift it in the graph to it's correct start
for i, data in enumerate(predicted_data):
padding = [None for p in range(i * prediction_len)]
plt.plot(padding + data, label='Prediction')
plt.legend()
plt.show()
'''
先画真实数据
然后对于每一个预测数据稍微加一个平移