阅读248 返回首页    go 技术社区[云栖]


Memory network (MemNN) & End to End memory network (MemN2N) & Dynamic memory network

记忆网络(Memory network)

虚拟助理在回答单个问句时表现不赖,但是在多轮对话中表现差强人意,以下例子说明我们面临着什么挑战:

  • Me: Can you find me some restaurants?
  • Assistance: I find a few places within 0.25 mile. The first one is Caffé Opera on Lincoln Street. The second one is …
  • Me: Can you make a reservation at the first restaurant?
  • Assistance: Ok. Let’s make a reservation for the Sushi Tom restaurant on the First Street.

为什么虚拟助理不能按照我的指令预订Caffé Opera?那是因为虚拟助理并不能记住我们的对话,她只是简单地回答我们的问题而不考虑向前谈话的上下午。因此,她所能做的只是找到与词“First”相关的餐厅(一间位于First Street的餐厅)。记忆网络(Memory Networks)通过记住处理过的信息来解决该问题。

The description on the memory networks (MemNN) is based on Memory networks, Jason Weston etc.

考虑以下语句和问句“Where is the milk now?”:

  1. Joe went to the kitchen.
  2. Fred went to the kitchen.
  3. Joe picked up the milk.
  4. Joe traveled to the office.
  5. Joe left the milk.
  6. Joe went to the bathroom.

存信息于记忆

首先,保存句子在记忆m中:

Memory slot \(m_i\) Sentence
1 Joe went to the kitchen.
2 Fred went to the kitchen.
3 Joe picked up the milk.
4 Joe traveled to the office.
5 Joe left the milk.
6 Joe went to the bathroom.

回答问句(Answering a query)

回答一个问句\(q\),首先通过打分函数\(s_0\)计算出与\(q\)最相关的句子\(m_{01}\)。然后将句子\(m_{01}\)与\(q\)结合起来形成新的问句\([q, m_{01}]\),并且定位最高分的下一个句子\(m_{01}\)。最后形成另外一个问句\([q, m_{01}, m_{02}]\)。此时我们并不是用该问句去查询下一个句,而是通过另一个打分函数定位一个词\(w\)。以上述例子来说明该过程。

回答问句\(q\)“where is the milk now?”,我们基于以下式子计算第一个推断:$$o_1 = \mathop{\arg\min}_{i=1,...,N}s_0(q, m_i)$$其中,\(s_0\)是计算输入\(x\)与\(m_i\)匹配分数的函数,\(o_1\)是记忆\(m\)中最佳匹配索引。这里\(m_{01}\)是第一个推断中最好的匹配句:“Joe left the milk.”。
然后,基于\([q: "where is the milk now", m_{01}: "Joe left the milk."]\)$$o_2
= \mathop{\arg\max}_{i=1,...,N}s_0([q, m_{01}], m_i)$$其中\(m_{02}\)是“Joe traveled to the office.”。
结合问句和推导的结果记为\(o\):$$o = [q, m_{01}, m_{02}] = ["where is the milk now"," Joe left the milk."," Joe travelled to the office."]$$生成最终的答复\(r\):$$r = \mathop{\arg\max}_{w \in W}s_r([q, m_{01}, m_{02}], w)$$其中\(W\)是字典中的所有词,\(s_r\)是另外一个计算\([q, m_{01}, m_{02}]\)和词\(w\)的匹配度。在我们的例子中,最后的回答\(r\)是“office”。

编码输入(Encoding the input)

我们利用词袋法(bags of words)表示输入文本。首先,我们以大小为\(\left|W\right|\)开始。
用词袋法对问句“where is the milk now”编码:

Vocaulary ... is Joe left milk now office the to travelled where ...
where is the milk now ... 1 0 0 1 1 0 1 0 0 1 ...

"where is the milk now"=(...,1,0,0,1,1,0,1,0,0,1,...)

为了达到更好的效果,我们分别用三个词集编码\(q\),\(m_{01}\)和\(m_{02}\),即\(q\)中的词“Joe”编码为“Joe_1”,\(m_{01}\)中同样的词编码为“Joe_2”:
\(q\): Where is the milk now?
\(m_{01}\): Joe left the milk.
\(m_{02}\): Joe travelled to the office.
编码上述\(q, m_{01}, m_{02}\):

... Joe_1 milk_1 ... Joe_2 milk_2 ... Joe_3 milk_3 ...
0 1 1 1 1 0

因此,每一句变换为大小为\(3\left|W\right|\)的编码。

计算打分函数(Compute the scoring function)

我们用词嵌入\(U\)转换\(3\left|W\right|\)词袋编码的句子为大小为\(n\)的词嵌入表示。计算打分函数\(s_0\)和\(s_r\):$$s_0(x, y)n = \Phi_x(x)^TU_0^TU_0\Phi_y(y)$$ $$s_r(x, y)n = \Phi_x(x)^TU_r^TU_r\Phi_y(y)$$其中\(U_0\)和\(U_r\)由边缘损失函数训练得到,\(\phi(m_i)\)转换句子\(m_i\)为词袋表示。

边缘损失函数(Margin loss function)

用边缘损失函数训练\(U_0\)和\(U_r\)中的参数:
$$\sum\limits_{\overline{f}\not=m_{01}}\max(0, \gamma - s_0(x, m_{01}) + s_0(x, \overline{f})) + $$ $$\sum\limits_{\overline{f}\not=m_{02}}\max(0, \gamma - s_0(\left[x, m_{01}\right], m_{02}) + s_0(\left[x, m_{01}\right], \overline{f^\prime})) + $$ $$\sum\limits_{\overline{r}\not=r}\max(0, \gamma - s_0(\left[x, m_{01}, m_{02}\right], r) + s_0(\left[x, m_{01}, m_{02}\right], \overline{r}))$$其中\(\overline{f}\),\(\overline{f^\prime}\)和\(\overline{r}\)是真实标签外的其它可能预测值。即当错误回答的分数大于正确回答的分数减\(\gamma\)时增加边缘损失。

大记忆网络(Huge memory networks)

对于记忆规模较大的的系统,计算每个记忆的分数较昂贵。其它可选方案为,计算完词嵌入\(U_0\)后,运用K-clustering将词嵌入空间分为K类。然后将每个输入\(x\)映射到相应类中,并在类空间中进行推测而不是在全部记忆空间中。

端到端记忆网络(End to End memory network, MemN2N)

The description, as well as the diagrams, on the end to end memory networks (MemN2N) are based on End-To-End Memory Networks, Sainbayar Sukhbaatar etc..

以一句问句“where is the milk now?”开始,并用大小为\(V\)的词袋表示(其中\(V\)为词典大小)。简单地,用词嵌入矩阵\(B(d\times V)\)转换上述向量为\(d\)维词嵌入。$$u = embedding_B(q)$$ 对于每个记忆项\(x_i\),用另一个词嵌入矩阵\(A(d\times V)\)转换为d维向量\(m_i\)。$$m_i = embedding_A(x_i)$$
max10
通过计算\(u\)与每个记忆\(m_i\)的内积然后softmax得到其匹配度:$$p_i =
softmax(u^Tm_i)$$
max11
用第三个词嵌入矩阵编码\(x_i\)为\(c_i\):$$c_i = emedding_C(x_i)$$ 计算输出:$$o = \sum\limits_{i}p_ic_i$$
max12
用矩阵\(W(V\times d)\)乘以\(o\)和\(u\)的和。结果传给softmax函数预测最终答案。$$\hat = softmax(W(o + u))$$
max13
这里,将所有步骤总结为一个图:
max1

多层

语言模型

动态记忆网络(Dynamic memory network)

最后更新:2017-10-22 22:03:38

  上一篇:go  阿里云服务器价格表-阿里云服务器租用价格
  下一篇:go  登录百度网盘失败