您當前的位置:首頁 > 舞蹈

時間序列的LSTM預測

作者:由 臨客 發表于 舞蹈時間:2020-07-21

背景

相關資料

(1)

(2)

LSTM解讀

時間序列的LSTM預測

LSTM示意圖

上圖是LSTM相當經典的示意圖了,上圖中有三門一個細胞狀態:

資訊的前向傳播滿足:

\begin{array}{l} f_{t}=\sigma\left(W_{i f} x_{t}+b_{i f}+W_{h f} h_{t-1}+b_{h f}\right) \\ i_{t}=\sigma\left(W_{i i} x_{t}+b_{i i}+W_{h i} h_{t-1}+b_{h i}\right) \\ o_{t}=\sigma\left(W_{i o} x_{t}+b_{i o}+W_{h o} h_{t-1}+b_{h o}\right) \\ g_{t}=\tanh \left(W_{i g} x_{t}+b_{i g}+W_{h g} h_{t-1}+b_{h g}\right) \\ c_{t}=f_{t} \odot c_{t-1}+i_{t} \odot g_{t} \\ h_{t}=o_{t} \odot \tanh \left(c_{t}\right) \end{array}\\\

其中

f,i,o

分別代表遺忘,輸入,輸出的比例係數,

g,c,h

分別代表候選狀態,細胞態和隱藏層狀態。比例係數都採用了sigmod函式限制了係數的範圍,候選狀態和輸入以及上一個時間層的隱藏態的資訊相關。細胞態可以認為是一種記憶單元,對記憶單元更新的時候,選擇遺忘部分之前的記憶,接受部分新的資訊,隱藏層的值直接從當前記憶細胞態獲得資訊,進過一個閥門輸出(輸出係數

o_t

控制)。

pytorch的LSTM繼承自RNNBase,通常自己寫的程式碼也需要繼承LSTM模組,一個典型的例子為:

self

lstm

=

nn

LSTM

input_size

=

input_size

hidden_size

=

hidden_size

num_layers

=

num_layers

batch_first

=

True

dropout

=

0。25

bidirectional

=

False

主要包含的引數為input_size,可以認為是sequence中一個word的維度,或者一個時間序列包含的特徵數,hidden

size

為隱藏層神經元個數,numlayers為網路在空間上的深度。bidirectional為True時,網路為雙向的

batch_first 為True時,輸入的X的第一個維度是樣本的batch_size,這對習慣了CNN的人來說會非常的親切。雖然官方並不推薦設定batch_first為True,我看到很多Kaggle大佬都是這樣寫的

[1]

,所以推薦大家也這樣寫。

時間序列的LSTM預測

num_layers為3的LSTM網路

除了繼承LSTM之外,還需要考慮Forward的寫法:

def

forward

self

x

):

h_1

=

Variable

torch

zeros

self

num_layers

x

size

0

),

self

hidden_size

to

device

))

c_1

=

Variable

torch

zeros

self

num_layers

x

size

0

),

self

hidden_size

to

device

))

out

,(

hn

cn

=

self

lstm

x

,(

h_1

c_1

))

## 選擇最後一層的值

mhn

=

hn

view

self

num_layers

x

size

0

),

self

hidden_size

)[

-

1

y

=

self

fc

mhn

batch\_first=True

時,輸入的X的格式為

(batch,len(seq),dim(feature))

h和c可以自定義初始化,如果不寫就會被自動初始化為0,寫的時候輸入的格式都為:

(num\_layers * num\_directions, batch, hidden\_size):

out

hn

cn

=

self

LSTM

x

h_1

c_1

))

一般進入LSTM層後會輸出三個數,在

batch\_first=True

out滿足如下的格式:

(batch,len(seq), num\_directions * hidden\_size)

可以看到ouput和輸入基本一致,除了最後一個維度被擴充套件為和神經元的數量一致。output 輸出的是經過LSTM後的最後輸出,而hn, cn則代表了n時刻的隱藏層。

時間序列的LSTM預測

多層LSTM 示意圖

一般會使用LSTM的隱藏層資訊進行運算,比如可以接Linear層:

self

fc

=

nn

Sequential

nn

Linear

hidden_size

middle_size

),

nn

SELU

True

),

nn

Dropout

p

=

drop_out

),

nn

Linear

middle_size

output_size

##torch。nn。Linear(in_features: int, out_features: int, bias: bool = True)

時間序列常見特徵

(1) shift

for

i

in

lags

print

‘Shifting:’

i

df

‘lag_’

+

str

i

)]

=

df

‘target’

tansform

lambda

x

x

shift

i

))

(2)rolling

for

w

in

windows

for

i

in

lags

print

‘Rolling period:’

i

df

‘lag_’

+

str

w

+

‘rolling_mean_’

+

str

i

)]

=

df

‘target’

transform

lambda

x

x

shift

w

rolling

i

mean

())

shift和rolling會導致部分NaN資料的產生,可以刪除這部分樣本,如果windows和lags過大,可以填充缺失值,比如全部填充為0。上述的windows和lags可透過分析自相關係數獲得,一般而言自相關係數呈現一定的週期性,這個週期就能被取為windows。自相關係數描述了序列的當前值與其過去值的關聯程度

[2]

from

statsmodels。graphics。tsaplots

import

plot_acf

plot_pacf

from

statsmodels。tsa。stattools

import

acf

pacf

import

plotly_express

as

px

import

plotly。graph_objects

as

go

def

plot_line

acf

):

fig

=

go

Figure

()

fig

add_trace

go

Scatter

x

=

list

range

len

acf

))),

y

=

acf

mode

=

‘lines’

name

=

‘acf’

))

fig

show

()

時間序列的LSTM預測

自相關係數示意圖

上面給的自相關係數示意圖可以明顯的看到該係數表現出一定的週期性,完全可以取峰值之間的間距作為lags和windows的取值。有時候直接獲取自相關係數效果不好,可以使用一階差分後或者高階差分後獲取自相關係數。

模型訓練

資料一般需要使用torch的utils。data轉化為Tensor

train_set

=

torch

utils

data

TensorDataset

torch

FloatTensor

X_train

),

torch

LongTensor

y_train

))

val_set

=

torch

utils

data

TensorDataset

torch

FloatTensor

X_valid

),

torch

LongTensor

y_valid

))

在使用模型時需要例項化模型,給定criterion,optimizer,scheduler

## define initial model

def

initialize_model

learning_rate

input_size

hidden_size

num_layers

middle_size

output_size

):

torch

manual_seed

42

model

=

LSTM

input_size

hidden_size

num_layers

middle_size

output_size

model

to

device

criterion

=

torch

nn

MSELoss

()

to

device

# mean-squared error for regression

optimizer

=

torch

optim

Adam

model

parameters

(),

lr

=

learning_rate

weight_decay

=

1e-5

scheduler

=

torch

optim

lr_scheduler

ReduceLROnPlateau

optimizer

patience

=

100

factor

=

0。5

min_lr

=

1e-7

eps

=

1e-08

return

model

criterion

optimizer

scheduler

optimizer接受模型的引數,optimizer的訓練集大小隻有一個minibatch的大小,因此optimizer的每一次更新都在batch迴圈下面,loss。backward()後面。至於為什麼每次都要使用zero_grad(),可以參考

[3]

scheduler的訓練集大小為完整的訓練集大小,因此在每一個epoch後呼叫scheduler。step()。儲存模型透過呼叫 torch。save 來儲存model。state_dict(),裡面儲存了模型的引數和optimizer相關的資訊。下次使用torch。load 載入模型即可。

from

fastprogress

import

master_bar

progress_bar

def

train_net

train_loader

val_loader

num_epochs

model

criterion

optimizer

scheduler

patience

verbose

user_define_score

):

valid_loss_min

=

np

Inf

patience

=

patience

current_epoch

=

0

stop

=

False

num_epochs

=

num_epochs

training_loss

=

[]

for

epoch

in

progress_bar

range

num_epochs

)):

train_loss

=

[]

train_score

=

[]

model

train

()

for

batch_i

,(

data

target

in

enumerate

train_loader

):

data

target

=

data

cuda

(),

target

cuda

()

optimizer

zero_grad

()

output

=

model

data

loss

=

criterion

output

target

train_loss

append

loss

item

())

a

=

target

data

cpu

()

numpy

()

b

=

output

detach

()

cpu

()

numpy

()

train_score

append

user_define_score

a

b

))

loss

backward

()

optimizer

step

()

val_loss

=

[]

val_score

=

[]

for

batch_i

,(

data

target

in

enumerate

val_loader

):

data

target

=

data

cuda

(),

target

cuda

()

output

=

model

data

loss

=

criterion

output

target

val_loss

append

loss

item

())

a

=

target

data

cpu

()

numpy

()

b

=

output

detach

()

cpu

()

numpy

()

val_score

append

user_define_score

a

b

))

if

epoch

%

100

==

0

and

verbose

print

f

‘Epoch

{epoch}

,train loss:{np。mean(train_loss):。4f,valid loss:{np。mean(val_loss):。4f}}’

scheduler

step

np

mean

val_loss

))

valid_loss

=

np

mean

val_loss

if

valid_loss

<=

valid_loss_min

torch

save

model

state_dict

(),

“model。pt”

valid_loss_min

=

valid_loss

cureent_epoch

=

0

if

valid_loss

>

valid_loss_min

current_epoch

+=

1

if

current_epoch

>

patience

print

“stropping training”

stop

=

True

break

if

stop

break

checkpoint

=

torch

load

“model。pt”

model

load_state_dict

checkpoint

return

model

訓練模型一般可以採用交叉驗證進行模型的訓練:

def

train_net_folds

X

y

test

flods

num_epochs

batch_size

patience

verbose

user_define_score

learning_rate

input_size

hidden_size

num_layers

middle_size

output_size

):

prediction

=

[]

scores

=

[]

for

flod_n

,(

train_index

valid_index

in

enumerate

flods

split

X

y

)):

X_train

X_valid

=

X

train_index

],

X

valid_index

y_train

y_valid

=

y

train_index

],

y

valid_index

train_set

=

torch

utils

data

TensorDataset

torch

FloatTensor

X_train

),

torch

FloatTensor

y_train

))

val_set

=

torch

utils

data

TensorDataset

torch

FloatTensor

X_valid

),

torch

FloatTensor

y_valid

))

train_loader

=

torch

utils

data

DataLoader

train_set

batch_size

=

batch_size

shuffle

=

False

val_loader

=

torch

utils

data

DataLoader

val_set

batch_size

=

batch_size

model

criterion

optimizer

scheduler

=

initialize_model

learning_rate

input_size

hidden_size

num_layers

middle_size

output_size

model

=

train_net

train_loader

val_loader

num_epochs

model

criterion

optimizer

scheduler

patience

verbose

user_define_score

y_pred_valid

=

[]

for

batch_i

data

target

in

enumerate

val_loader

):

data

target

=

data

cuda

(),

target

cuda

()

p

=

model

data

pred

=

p

cpu

()

detach

()

numpy

()

y_pred_valid

extend

pred

scores

append

user_define_score

y_valid

np

array

y_pred_valid

)))

## predict

y_pred

=

model

test

cuda

())

y_pred

=

y_pred

detach

()

cpu

()

numpy

()

prediction

append

y_pred

prediction

=

np

sum

prediction

axis

=

0

/

len

prediction

score

=

np

mean

scores

print

‘——’

*

50

print

‘CV mean score:

{0:。4f}

, std:

{1:。4f}

。’

format

np

mean

scores

),

np

std

scores

)))

print

‘——’

*

50

print

prediction

shape

return

prediction

score

seq2seq

[4]

這小節使用LSTM來構建帶有Attention的Seq2Seq模型,主要參考的資料包括

[5]

[6]

時間序列的LSTM預測

注意力機制是指輸出的序列的某一個單詞和輸入序列所有單詞之間的關係,在語言翻譯中,輸出的某個單詞與輸入的所有單詞之間的相關性並不是都相同的,可能只有其中幾個單詞很相關,其他的單詞是不相關的。為了使得模型獲得這樣的識別能力,引入了attention注意力機制。

Encoder

encoder採用多層的LSTM,NLP任務通常採用雙向的LSTM,考慮到時間序列資料的因果關係,這裡採用單向的LSTM。這裡的out為經過多層提取後的特徵,即下文Atttention中輸入的H。

class

Encoder

nn

Module

):

def

__init__

self

input_size

hidden_size

num_layers

batch_first

=

True

drop_out

=

0。25

bidirectional

=

False

):

super

Encoder

self

__init__

()

self

input_size

=

input_size

self

hidden_size

=

hidden_size

self

num_layers

=

num_layers

##lstm

self

lstm

=

nn

LSTM

input_size

=

input_size

hidden_size

=

hidden_size

num_layers

=

num_layers

batch_first

=

True

dropout

=

0。25

bidirectional

=

False

def

forward

self

x

):

## x=[batch_size,seq_len,input_dim]

h_1

=

Variable

torch

zeros

self

num_layers

x

size

0

),

self

hidden_size

to

device

))

c_1

=

Variable

torch

zeros

self

num_layers

x

size

0

),

self

hidden_size

to

device

))

out

,(

hn

cn

=

self

lstm

x

,(

h_1

c_1

))

## out=[]

return

out

hn

cn

Attention

這裡使用Encoder最後一層的輸出

H

和Decoder最後一層的隱藏層

s_{t-1}

來計算注意力係數

\alpha_t

LSTM輸出的H的維度為

[batch\_size,enc\_input\_dim,enc\_hid\_dim]

Decoder的最後一層的隱藏層

s_{t-1}

的維度為

[batch\_size,1,dec\_hid\_dim]

,第二個維度為1代表Decoder每次只輸出一個預測值。

attention的計算公式為:

\begin{array}{c} E_{t}=\tanh \left( \operatorname{attn}\left(s_{t-1}, H\right)\right) \\ \tilde{a_{t}}=v E_{t} \\ a_{t}=\operatorname{softmax}\left(\tilde{a_{t}}\right) \end{array}\\\

下面簡要解釋一下attention中張量的維度:

(1)輸入decoder_hidden的維度為

[num\_layers,batch\_size,dec\_hid\_dim]

為了取得

s_{t-1}

,先取得最後一層的隱藏層,然後再交換維度(permute)。

(2)為了將Decoder的資訊

s_{t-1}

和Encoder的資訊

H

組合在一起,decoder_hidden還需要在第二個維度上repeat數次。

(3)將組合後

[batch\_size,enc\_input\_dim,enc\_hid\_dim+dec\_hid\_dim]

經過線性對映後對映為

[batch\_size,enc\_input\_dim,dec\_hid\_dim]

(4)再經過一層線性對映變成

[batch\_size,enc\_input\_dim,1]

(5)進行softmax的到注意力係數

[batch\_size,enc\_input\_dim]

class

Attention

nn

Module

):

def

__init__

self

enc_hid_dim

dec_hid_dim

):

super

()

__init__

()

self

attn

=

nn

Linear

enc_hid_dim

+

dec_hid_dim

dec_hid_dim

self

v

=

nn

Linear

dec_hid_dim

1

bias

=

False

def

forward

self

decoder_hidden

encoder_outputs

):

## encoder_outputs=[batch_size,src_len,enc_hid_dim]

# decoder_hidden=[num_layers,batch,dec_hid_dim]

# original num_layers=1,S_{t-1} is the hidden state of the last layer

# so we use the last hidden layer and it has past over same layers as the encoder_output

src_len

=

encoder_outputs

shape

1

##last hidden layers

decoder_hidden

=

decoder_hidden

-

1

,:,:]

print

decoder_hidden

shape

#hidden=[src_len,batch,dec_hid_dim]

decoder_hidden

=

decoder_hidden

repeat

src_len

1

1

#concat hidden and enc_ouput

#hidden=[batch,src,dec_hid_dim]

decoder_hidden

=

decoder_hidden

permute

1

0

2

#energy=[batch,src,dec_hid_dim]

energy

=

torch

tanh

self

attn

torch

cat

((

decoder_hidden

encoder_outputs

),

dim

=

2

)))

#attention=[bath,src]

attention

=

self

v

energy

squeeze

2

return

F

softmax

attention

dim

=

1

Decoder

時間序列的LSTM預測

每次Decoder作預測時,需要一個當前輸入

y_t

,Encoder的輸出

H,

以及隱藏層資訊

(h,c)

。Decoder先透過多層LSTM預測,然後經過線性變換輸出。

LSTM層的輸入包括兩部分,一部分是來自外部的輸入,這部分可以是上一個時刻預測的

\hat y_{t-1}

,也可以是和

\hat y_{t}

十分相關的

y_t

。在NLP任務中,

y_t

可以是翻譯好的單詞或者上一個時刻的預測,且第一個輸入為標準符,但是時間序列問題是沒有已經預測好的結果的,因此只能採用上一個時刻的預測值,而且第一個輸入也不能採用標準符這類,一般採用

\hat y_{1}

的上一個時刻

\hat y_{0}

。在處理資料的時刻需要注意留出

\hat y_{0}

\begin{array}{c} c=a_{t} H \\ s_{t}=LSTM\left(y_{t}, c, s_{t-1}\right) \\ \hat{y}_{t}=f\left(y_{t}, c, s_{t}\right) \end{array}\\\

下面簡要解釋一下Decoder中張量的維度:

(1)輸入x的維度為:

[batch\_size,1,dec\_input\_dim]

,input_hidden的維度為:

[num\_layers,batch\_size,dec\_hid\_dim]

兩者經過attention獲得

a_t

[batch\_size,enc\_input\_dim]

的係數。H 的維度為

[batch\_size,enc\_input\_dim,enc\_hid\_dim]

,計算經過注意力機制後得到的額外輸入c

[batch\_szie,1,enc\_hid\_dim]

(2)組裝x和c

[batch\_szie,1,dec\_input\_dim+dec\_hid\_dim]

經過LSTM層,輸出

[batch\_szie,1,dec\_hid\_dim]

(3)將輸入x,c和經過LSTM層的輸出rnn_output一起經過一個線性變換變為輸出。

[batch\_size,1,output\_dim]

class

Decoder

nn

Module

):

## attention(encoder,decoder) ,decoder,fc,output

def

__init__

self

attention

dec_input_dim

output_dim

enc_hid_dim

dec_hid_dim

dec_num_layers

batch_first

=

True

drop_out

=

0。25

bidirectional

=

False

):

super

Decoder

self

__init__

()

self

attention

=

attention

self

dec_input_dim

=

dec_input_dim

self

enc_hid_dim

=

enc_hid_dim

self

dec_hid_dim

=

dec_hid_dim

self

dec_num_layers

=

dec_num_layers

self

output_dim

=

output_dim

##lstm

self

rnn

=

nn

LSTM

input_size

=

enc_hid_dim

+

dec_input_dim

hidden_size

=

dec_hid_dim

num_layers

=

dec_num_layers

batch_first

=

True

dropout

=

0。25

bidirectional

=

False

##linear change the finial dim

self

fc_out

=

nn

Linear

enc_hid_dim

+

dec_hid_dim

+

dec_input_dim

output_dim

def

forward

self

x

input_hidden

input_cell

enc_output

):

# [batch_size,1,dec_input_dim]

## enc_output=[batch,src_len,enc_hid_dim]

#a=[batch,src_len]

a

=

self

attention

input_hidden

enc_output

#[batch,1,src_len]

a

=

a

unsqueeze

1

#c=aH c=[batch,1,enc_hid_dim]

c

=

torch

bmm

a

enc_output

#x=x。unsqueeze(1)

##rnn_input=[batch,1,enc_hid_dim+dec_input_dim]

rnn_input

=

torch

cat

((

x

c

),

dim

=

2

rnn_output

,(

rnn_hidden

rnn_cell

=

self

rnn

rnn_input

,(

input_hidden

input_cell

))

##rnn_output=st=[batch,1,dec_hid_dim]

##fc。(x,c,st) [batch]

## ypred=fc。(x,c,st)

##[batch,1,dec_hid_dim+dec_input_dim+enc_hid_dim]=>[batch,1,output_dim]

y_pred

=

self

fc_out

torch

cat

((

x

c

rnn_output

),

dim

=

2

))

return

y_pred

rnn_hidden

rnn_cell

Seq2Seq

接下來只需要組裝模型即可,模型需要兩個輸入一個是輸入的x,另一個是Decoder需要一個初始輸入,這裡採用上一個時刻的

\hat y_{0}

,首先經過Encoder後輸出,以及最後一層的隱藏層,將其作為Decoder隱藏層的初始化,(限制了Encoder和Decoder的層數和hidden_size 都必須相同),假設結果為result,迴圈預測並將結果組裝在一起,得到維度為

[batch\_size,target\_len,output\_dim]

class

Seq2Seq

nn

Module

):

def

__init__

self

encoder

decoder

target_len

):

super

Seq2Seq

self

__init__

()

self

encoder

=

encoder

self

decoder

=

decoder

## how long you need to predict

self

target_len

=

target_len

def

forward

self

x

prev_y

):

##x=[batch_size,seq_len,input_dim]

##y=[batch_size,1,input_dim]

encoder_output

encoder_hidden

encoder_cell

=

self

encoder

x

result

=

torch

FloatTensor

([])

cuda

()

#inital decoder_input ,

#decoder has same [num_layers,batch,hidden-zie]

decoder_input

=

prev_y

decoder_hidden_pre

decoder_cell_pre

=

encoder_hidden

encoder_cell

## make predict

for

i

in

range

self

target_len

):

## use previcous decoder Hidden_Cell states and previous input to make predict

decoder_output

decoder_hidden

decoder_cell

=

self

decoder

decoder_input

decoder_hidden_pre

decoder_cell_pre

encoder_output

## update

decoder_input

=

decoder_output

decoder_hidden_pre

=

decoder_hidden

decoder_cell_pre

=

decoder_cell

#decoder_output=[batch,1,output_dim]

result

=

torch

cat

((

result

decoder_output

),

dim

=

1

#result=[batch,target_len,output_dim]

return

result

初始化模型:

由於輸出後的預測需要在下一次再次進入網路,因此dec_input_dim==output_dim。

此外Decoder和Encoder的LSTM一一對應,Encoder的隱藏層才能作為Decoder的隱藏層的初始化。

def

initialize_model

learning_rate

enc_input_dim

dec_input_dim

output_dim

enc_hid_dim

dec_hid_dim

enc_num_layers

dec_num_layers

target_len

):

torch

manual_seed

42

assert

dec_input_dim

==

output_dim

assert

enc_num_layers

==

dec_num_layers

assert

enc_hid_dim

==

dec_hid_dim

encoder

=

Encoder

enc_input_dim

enc_hid_dim

enc_num_layers

attention

=

Attention

enc_hid_dim

dec_hid_dim

decoder

=

Decoder

attention

dec_input_dim

output_dim

enc_hid_dim

dec_hid_dim

dec_num_layers

model

=

Seq2Seq

encoder

decoder

target_len

model

to

device

criterion

=

torch

nn

MSELoss

()

to

device

# mean-squared error for regression

optimizer

=

torch

optim

Adam

model

parameters

(),

lr

=

learning_rate

weight_decay

=

1e-5

scheduler

=

torch

optim

lr_scheduler

ReduceLROnPlateau

optimizer

patience

=

100

factor

=

0。5

min_lr

=

1e-7

eps

=

1e-08

return

model

criterion

optimizer

scheduler

參考

^https://www。kaggle。com/omershect/learning-pytorch-lstm-deep-learning-with-m5-data

^https://storage。googleapis。com/kaggle-forum-message-attachments/695157/14589/Day_7。pdf

^https://www。zhihu。com/question/303070254

^https://pytorch。org/tutorials/intermediate/seq2seq_translation_tutorial。html

^https://www。kaggle。com/omershect/learning-pytorch-seq2seq-with-m5-data-set

^https://wmathor。com/index。php/archives/1451/#comment-624

標簽: dim  size  batch  input  hidden