閱讀248 返回首頁    go 阿裏雲 go 技術社區[雲棲]


Memory network (MemNN) & End to End memory network (MemN2N) & Dynamic memory network

記憶網絡(Memory network)

虛擬助理在回答單個問句時表現不賴,但是在多輪對話中表現差強人意,以下例子說明我們麵臨著什麼挑戰:

  • Me: Can you find me some restaurants?
  • Assistance: I find a few places within 0.25 mile. The first one is Caffé Opera on Lincoln Street. The second one is …
  • Me: Can you make a reservation at the first restaurant?
  • Assistance: Ok. Let’s make a reservation for the Sushi Tom restaurant on the First Street.

為什麼虛擬助理不能按照我的指令預訂Caffé Opera?那是因為虛擬助理並不能記住我們的對話,她隻是簡單地回答我們的問題而不考慮向前談話的上下午。因此,她所能做的隻是找到與詞“First”相關的餐廳(一間位於First Street的餐廳)。記憶網絡(Memory Networks)通過記住處理過的信息來解決該問題。

The description on the memory networks (MemNN) is based on Memory networks, Jason Weston etc.

考慮以下語句和問句“Where is the milk now?”:

  1. Joe went to the kitchen.
  2. Fred went to the kitchen.
  3. Joe picked up the milk.
  4. Joe traveled to the office.
  5. Joe left the milk.
  6. Joe went to the bathroom.

存信息於記憶

首先,保存句子在記憶m中:

Memory slot \(m_i\) Sentence
1 Joe went to the kitchen.
2 Fred went to the kitchen.
3 Joe picked up the milk.
4 Joe traveled to the office.
5 Joe left the milk.
6 Joe went to the bathroom.

回答問句(Answering a query)

回答一個問句\(q\),首先通過打分函數\(s_0\)計算出與\(q\)最相關的句子\(m_{01}\)。然後將句子\(m_{01}\)與\(q\)結合起來形成新的問句\([q, m_{01}]\),並且定位最高分的下一個句子\(m_{01}\)。最後形成另外一個問句\([q, m_{01}, m_{02}]\)。此時我們並不是用該問句去查詢下一個句,而是通過另一個打分函數定位一個詞\(w\)。以上述例子來說明該過程。

回答問句\(q\)“where is the milk now?”,我們基於以下式子計算第一個推斷:$$o_1 = \mathop{\arg\min}_{i=1,...,N}s_0(q, m_i)$$其中,\(s_0\)是計算輸入\(x\)與\(m_i\)匹配分數的函數,\(o_1\)是記憶\(m\)中最佳匹配索引。這裏\(m_{01}\)是第一個推斷中最好的匹配句:“Joe left the milk.”。
然後,基於\([q: "where is the milk now", m_{01}: "Joe left the milk."]\)$$o_2
= \mathop{\arg\max}_{i=1,...,N}s_0([q, m_{01}], m_i)$$其中\(m_{02}\)是“Joe traveled to the office.”。
結合問句和推導的結果記為\(o\):$$o = [q, m_{01}, m_{02}] = ["where is the milk now"," Joe left the milk."," Joe travelled to the office."]$$生成最終的答複\(r\):$$r = \mathop{\arg\max}_{w \in W}s_r([q, m_{01}, m_{02}], w)$$其中\(W\)是字典中的所有詞,\(s_r\)是另外一個計算\([q, m_{01}, m_{02}]\)和詞\(w\)的匹配度。在我們的例子中,最後的回答\(r\)是“office”。

編碼輸入(Encoding the input)

我們利用詞袋法(bags of words)表示輸入文本。首先,我們以大小為\(\left|W\right|\)開始。
用詞袋法對問句“where is the milk now”編碼:

Vocaulary ... is Joe left milk now office the to travelled where ...
where is the milk now ... 1 0 0 1 1 0 1 0 0 1 ...

"where is the milk now"=(...,1,0,0,1,1,0,1,0,0,1,...)

為了達到更好的效果,我們分別用三個詞集編碼\(q\),\(m_{01}\)和\(m_{02}\),即\(q\)中的詞“Joe”編碼為“Joe_1”,\(m_{01}\)中同樣的詞編碼為“Joe_2”:
\(q\): Where is the milk now?
\(m_{01}\): Joe left the milk.
\(m_{02}\): Joe travelled to the office.
編碼上述\(q, m_{01}, m_{02}\):

... Joe_1 milk_1 ... Joe_2 milk_2 ... Joe_3 milk_3 ...
0 1 1 1 1 0

因此,每一句變換為大小為\(3\left|W\right|\)的編碼。

計算打分函數(Compute the scoring function)

我們用詞嵌入\(U\)轉換\(3\left|W\right|\)詞袋編碼的句子為大小為\(n\)的詞嵌入表示。計算打分函數\(s_0\)和\(s_r\):$$s_0(x, y)n = \Phi_x(x)^TU_0^TU_0\Phi_y(y)$$ $$s_r(x, y)n = \Phi_x(x)^TU_r^TU_r\Phi_y(y)$$其中\(U_0\)和\(U_r\)由邊緣損失函數訓練得到,\(\phi(m_i)\)轉換句子\(m_i\)為詞袋表示。

邊緣損失函數(Margin loss function)

用邊緣損失函數訓練\(U_0\)和\(U_r\)中的參數:
$$\sum\limits_{\overline{f}\not=m_{01}}\max(0, \gamma - s_0(x, m_{01}) + s_0(x, \overline{f})) + $$ $$\sum\limits_{\overline{f}\not=m_{02}}\max(0, \gamma - s_0(\left[x, m_{01}\right], m_{02}) + s_0(\left[x, m_{01}\right], \overline{f^\prime})) + $$ $$\sum\limits_{\overline{r}\not=r}\max(0, \gamma - s_0(\left[x, m_{01}, m_{02}\right], r) + s_0(\left[x, m_{01}, m_{02}\right], \overline{r}))$$其中\(\overline{f}\),\(\overline{f^\prime}\)和\(\overline{r}\)是真實標簽外的其它可能預測值。即當錯誤回答的分數大於正確回答的分數減\(\gamma\)時增加邊緣損失。

大記憶網絡(Huge memory networks)

對於記憶規模較大的的係統,計算每個記憶的分數較昂貴。其它可選方案為,計算完詞嵌入\(U_0\)後,運用K-clustering將詞嵌入空間分為K類。然後將每個輸入\(x\)映射到相應類中,並在類空間中進行推測而不是在全部記憶空間中。

端到端記憶網絡(End to End memory network, MemN2N)

The description, as well as the diagrams, on the end to end memory networks (MemN2N) are based on End-To-End Memory Networks, Sainbayar Sukhbaatar etc..

以一句問句“where is the milk now?”開始,並用大小為\(V\)的詞袋表示(其中\(V\)為詞典大小)。簡單地,用詞嵌入矩陣\(B(d\times V)\)轉換上述向量為\(d\)維詞嵌入。$$u = embedding_B(q)$$ 對於每個記憶項\(x_i\),用另一個詞嵌入矩陣\(A(d\times V)\)轉換為d維向量\(m_i\)。$$m_i = embedding_A(x_i)$$
max10
通過計算\(u\)與每個記憶\(m_i\)的內積然後softmax得到其匹配度:$$p_i =
softmax(u^Tm_i)$$
max11
用第三個詞嵌入矩陣編碼\(x_i\)為\(c_i\):$$c_i = emedding_C(x_i)$$ 計算輸出:$$o = \sum\limits_{i}p_ic_i$$
max12
用矩陣\(W(V\times d)\)乘以\(o\)和\(u\)的和。結果傳給softmax函數預測最終答案。$$\hat = softmax(W(o + u))$$
max13
這裏,將所有步驟總結為一個圖:
max1

多層

語言模型

動態記憶網絡(Dynamic memory network)

最後更新:2017-10-22 22:03:38

  上一篇:go  阿裏雲服務器價格表-阿裏雲服務器租用價格
  下一篇:go  登錄百度網盤失敗