《Memory Networks》阅读笔记

2017-11-17 本文已影响0人 best___me

论文链接：https://arxiv.org/pdf/1410.3916.pdf

包括inference components和a long-term memory component

long-term memory可以读写，用于prediction。

Introduction：

大多数神经网络模型不能读写long-term memory部分，同时也不能与inference紧密连接。因此，RNN的memory(encoded by hidden states and weights)太小了。

The central idea is to combine the successful learning strategies developed in the machine learning literature for inference with a memory component that can be read and written to.

模型被训练的去有效地操作memory component

Memory Network：

一个memory network包括a memory m(an array of objects indexed by mi，objects可以是vectors或strings)，四个components：I，G，O，R

I：（input feature map），将输入转换为internal feature representation；G：（generalization），根据新的输入更新memories，称作泛化是因为网络有机会压缩和泛化它的记忆，以备将来只需；O：（Output feature map），根据新的输入和memory state生成新的输出；R：（response），将输出转换为所需的response format，例如：a textual response or an action

给定一个输入 x（例如：一个输入字符，单词或句子，一个图像或音频信号），模型的流程如下：

1. 转换x为internal feature representation I(x)

2. 根据新输入更新mi：mi=G(mi, I(x), m), 任意的i

3. 根据输入和memory计算输出： o=O(I(x), m)

4. 最后，decode输出feature o，给出最后的response：r=R(o)

模型的训练和测试阶段都需要进行这些步骤，只是在测试阶段不更新参数了。

I component: 利用标准的预处理过程，例如parsing，转换文本到dense feature vector

G component: 最简单的方法是将I(x)保存在memory中的一个槽slot中。

H(.)是选择slot的函数。G可以更新memory中的H(x)，其他部分都不变。

如果memory很大，不用操作全部的memory。如果memory满了，可以使用forgetting

O and R component：O往往是从memory中读取然后做inference。R生成response，例如在QA任务中，R会生成答案，R可能是一个RNN(conditioned on the output of O)，文章有一个假设:

Out hypothesis is that without conditioning on such memories, such an RNN will perform poorly

Basic model：

I获取输入文本，假设其是一个句子，或者是一个陈述、事实或可以被系统回答的问题。文本会被存储在下一个可用的memory slot中，以其原本的形式。使用新的memory，旧的memory不会被更新。

O，生成输出的特征，通过找到k个memory:

k=1

k=2的情况，在找到第一个的情况下，找第二个

k=2

最终的输出o是[x, mo1, mo2]

对于R，最简单的response是返回mok

Score函数：So和Sr

《Memory Networks》阅读笔记

猜你喜欢

热点阅读