2018-04-25 第五周

2018-06-14 本文已影响0人 hobxzzy

本周任务：按照标准的数据集划分，即训练集：2 测试集：1的比例划分数据，测试模型效果。

首先，划分数据集，按照上述的比例，源数据：71532条，训练集；50000条，测试集：20000条

没有在原始数据集划分，而是在数据递交中划分，因为lstm需要循环训练，因此需要在源数据的基础上取余操作：

def next_batch(batch_size, step):

return vec_lists[step%(50000//batch_size)], tag_lists[step%(50000//batch_size)]

def test_Data(step):

return vec_list[50000//batch_size + step%(20000//batch_size)], tag_lists[50000//batch_size + step%(20000//batch_size)]

不断的在测试集上检测，求得相关正确率：

if __name__ == "__main__":

tag_list = init_Tag()

vec_list = init_Vec()

init_Data()

pred = RNN(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y, logits=pred))

train_op = tf.train.AdamOptimizer(lr).minimize(cost)

correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))

accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# init= tf.initialize_all_variables() # tf 马上就要废弃这种写法

# 替换成下面的写法:

init = tf.global_variables_initializer()

list = []

with tf.Session() as sess:

sess.run(init)

step = 0

test_step = 0

while step * batch_size <= training_iters:

batch_xs, batch_ys = next_batch(batch_size, step)

batch_xs = np.array(batch_xs)

batch_xs = batch_xs.reshape([batch_size, n_steps, n_inputs])

sess.run([train_op], feed_dict={

x: batch_xs,

y: batch_ys,

})

if step == training_iters//batch_size:

while test_step * batch_size < 20000:

test_batch_xs, test_batch_ys = test_Data(test_step)

test_batch_xs = np.array(batch_xs)

test_batch_xs = test_batch_xs.reshape([batch_size, n_steps, n_inputs])

a = sess.run(accuracy, feed_dict={

x: test_batch_xs,

y: test_batch_ys,

})

list.append(a)

print(a)

test_step += 1

step += 1

print(" average num = ")

print(sum(list)/len(list))

print(" max num = ")

print(np.max(list))

得出最终的训练效果：

平均正确率百分之95.3

这样的一个效果，依旧不是很满意，在经过小组讨论之后，发现我的词向量存在问题，因为我们直接使用的是word2vec，自动训练的代码，对中文的向量化效果不是很好，所以我们修改了词向量库，改用了搜狗的词向量库，重新生成的词向量进行训练。

2018-04-25 第五周

猜你喜欢

热点阅读