【論文解析-2】Self Attention自注意力機制

作者：由 PurJoy 發表于攝影時間：2019-07-08

本文原始論文是：A Structured Self-Attentive Sentence Embedding，都是一些自己的學習筆記，歡迎討論、指教。

Self-Attention機制：

1、核心在於兩個線性變換：

2、線性變換：

解釋：其中r是注意力分佈數量，論文建議不低於2個；d_a是指中間引數，可以為任意大小。

論文的核心就是W1和W2，理解這兩個即可。

import

torch

import

torch。nn

import

torch。nn。functional

# 輸入編碼：使用雙向LSTM，比較簡單，不解釋了

class

EncoderRNN

（

。

Module

）：

def

__init__

（

self

，

embed_size

，

hiden_size

，

vocab_size

，

gpu

）：

super

（

EncoderRNN

，

self

）

。

__init__

（）

self

。

gpu

self

。

hidden_size

hiden_size

self

。

embed_size

self

。

vocab_size

self

。

embed

。

Embedding

（

self

。

vocab_size

，

self

。

embed_size

）

self

。

lstm

。

LSTM

（

self

。

embed_size

，

self

。

hidden_size

，

batch_first

True

，

bidirectional

True

）

def

init_hidden

（

self

，

batch_size

）：

torch

。

zeros

（

，

batch_size

，

self

。

hidden_size

）

torch

。

zeros

（

，

batch_size

，

self

。

hidden_size

）

self

。

gpu

：

。

cuda

（）

。

cuda

（）

return

，

def

forward

（

self

，

sentences

）：

batch_size

sentences

。

size

（）［

］

，

self

。

init_hidden

（

batch_size

）

embed

self

。

embed

（

sentences

）

output

，

self

。

lstm

（

embed

，

）

return

output

# 以下self-attention程式碼是自己實現的，僅供參考

# 無特別說明，程式碼中的引數、註釋與論文中保持一致

class

SelfAttention

（

。

Module

）：

def

__int__

（

self

，

hidden_size

，

num_class

）：

super

（

SelfAttention

，

self

）

。

__init__

（）

self

。

labels

num_class

self

。

hidden_size

self

。

attention

。

Sequential

（

# 對應於論文權重矩陣：W_s1，其中10指： d_a

。

Linear

（

self

。

hidden_size

，

），

。

Tanh

（

True

），

# # 對應於論文權重矩陣：W_s2，其中5指：r

。

Linear

（

，

）

self

。

output

。

Linear

（

self

。

hidden_size

，

self

。

labels

）

def

forward

（

self

，

encode_output

）：

# 計算自注意力權重矩陣A：atte_weight=A=［batch_size， r， seq_len］

atte_weight

。

softmax

（

self

。

attention

（

encode_output

），

dim

）

。

permute

（

，

）

# 計算隱藏層的加權和M：torch。bmm（［batch_size， r， seq_len］，［batch_size， seq_len， 2H］）

# =>［batch_size， r， 2H］ => ［batch_size， 2H］=output

output

torch

。

sum

（

torch

。

bmm

（

atte_weight

，

encode_output

），

dim

）

# 注意本行程式碼是最後的全連線層和Softmax層，嚴格意義是不屬於self-attention框架程式碼了

result

。

softmax

（

self

。

output

（

output

），

dim

）

return

result

Reference：

從三大頂會論文看百變Self-Attention

標簽： self size hidden batch Embed

上一篇:請問有電工嗎？衛生間應該裝幾個插座？

下一篇：鐳射切割機調整焦距的方法

【論文解析-2】Self Attention自注意力機制

猜你喜歡

自然語言中的花式預訓練方法總結（二）

CCF BDCI 劇本角色情感識別第二版分享:多工學習開源方案

用python編寫控制網路裝置的自動化指令碼7：跳板（遠端登入）

python turtle庫遞迴畫奇異三角形

Python3.根據ID3v2批次修改mp3檔名