[深度學習框架] PyTorch 常用程式碼段總結

作者：由智彥博發表于舞蹈時間：2021-10-08

PyTorch最好的資料是官方文件。本文是PyTorch常用程式碼段，在參考資料［1］的基礎上做了一些修補，方便使用時查閱。轉自［深度學習框架］ PyTorch 常用程式碼段總結 - 極市社群（cvmart。net）

1。基本配置

匯入包和版本查詢

import torch

import torch。nn as nn

import torchvision

print（torch。__version__）

print（torch。version。cuda）

print（torch。backends。cudnn。version（））

print（torch。cuda。get_device_name（0））

可復現性

在硬體裝置（CPU、GPU）不同時，完全的可復現性無法保證，即使隨機種子相同。但是，在同一個裝置上，應該保證可復現性。具體做法是，在程式開始的時候固定torch的隨機種子，同時也把numpy的隨機種子固定。

np。random。seed（0）

torch。manual_seed（0）

torch。cuda。manual_seed_all（0）

torch。backends。cudnn。deterministic = True

torch。backends。cudnn。benchmark = False

顯示卡設定

如果只需要一張顯示卡

# Device configuration

device = torch。device（‘cuda’ if torch。cuda。is_available（） else ‘cpu’）

如果需要指定多張顯示卡，比如0，1號顯示卡。

import os

os。environ［‘CUDA_VISIBLE_DEVICES’］ = ‘0，1’

也可以在命令列執行程式碼時設定顯示卡：

CUDA_VISIBLE_DEVICES=0，1 python train。py

清除視訊記憶體

torch。cuda。empty_cache（）

也可以使用在命令列重置GPU的指令

nvidia-smi ——gpu-reset -i ［gpu_id］

2。張量（Tensor）處理

張量的資料型別

PyTorch有9種CPU張量型別和9種GPU張量型別。

張量基本資訊

tensor = torch。randn（3，4，5）

print（tensor。type（）） # 資料型別

print（tensor。size（）） # 張量的shape，是個元組

print（tensor。dim（）） # 維度的數量

命名張量

張量命名是一個非常有用的方法，這樣可以方便地使用維度的名字來做索引或其他操作，大大提高了可讀性、易用性，防止出錯。

# 在PyTorch 1。3之前，需要使用註釋

# Tensor［N， C， H， W］

images = torch。randn（32， 3， 56， 56）

images。sum（dim=1）

images。select（dim=1， index=0）

# PyTorch 1。3之後

NCHW = ［‘N’， ‘C’， ‘H’， ‘W’］

images = torch。randn（32， 3， 56， 56， names=NCHW）

images。sum（‘C’）

images。select（‘C’， index=0）

# 也可以這麼設定

tensor = torch。rand（3，4，1，2，names=（‘C’， ‘N’， ‘H’， ‘W’））

# 使用align_to可以對維度方便地排序

tensor = tensor。align_to（‘N’， ‘C’， ‘H’， ‘W’）

資料型別轉換

# 設定預設型別，pytorch中的FloatTensor遠遠快於DoubleTensor

torch。set_default_tensor_type（torch。FloatTensor）

# 型別轉換

tensor = tensor。cuda（）

tensor = tensor。cpu（）

tensor = tensor。float（）

tensor = tensor。long（）

torch.Tensor與np.ndarray轉換

除了CharTensor，其他所有CPU上的張量都支援轉換為numpy格式然後再轉換回來。

ndarray = tensor。cpu（）。numpy（）

tensor = torch。from_numpy（ndarray）。float（）

tensor = torch。from_numpy（ndarray。copy（））。float（） # If ndarray has negative stride。

Torch.tensor與PIL.Image轉換

# pytorch中的張量預設採用［N， C， H， W］的順序，並且資料範圍在［0，1］，需要進行轉置和規範化

# torch。Tensor -> PIL。Image

image = PIL。Image。fromarray（torch。clamp（tensor*255， min=0， max=255）。byte（）。permute（1，2，0）。cpu（）。numpy（））

image = torchvision。transforms。functional。to_pil_image（tensor） # Equivalently way

# PIL。Image -> torch。Tensor

path = r‘。/figure。jpg’

tensor = torch。from_numpy（np。asarray（PIL。Image。open（path）））。permute（2，0，1）。float（） / 255

tensor = torchvision。transforms。functional。to_tensor（PIL。Image。open（path）） # Equivalently way

np.ndarray與PIL.Image的轉換

image = PIL。Image。fromarray（ndarray。astype（np。uint8））

ndarray = np。asarray（PIL。Image。open（path））

從只包含一個元素的張量中提取值

value = torch。rand（1）。item（）

張量形變

# 在將卷積層輸入全連線層的情況下通常需要對張量做形變處理，

# 相比torch。view，torch。reshape可以自動處理輸入張量不連續的情況。

tensor = torch。rand（2，3，4）

shape = （6， 4）

tensor = torch。reshape（tensor， shape）

打亂順序

tensor = tensor［torch。randperm（tensor。size（0））］ # 打亂第一個維度

水平翻轉

# pytorch不支援tensor［：：-1］這樣的負步長操作，水平翻轉可以透過張量索引實現

# 假設張量的維度為［N， D， H， W］。

tensor = tensor［：，：，：，torch。arange（tensor。size（3） - 1， -1， -1）。long（）］

複製張量

# Operation | New/Shared memory | Still in computation graph |

tensor。clone（） # | New | Yes |

tensor。detach（） # | Shared | No |

tensor。detach。clone（）（） # | New | No |

張量拼接

‘’‘

注意torch。cat和torch。stack的區別在於torch。cat沿著給定的維度拼接，

而torch。stack會新增一維。例如當引數是3個10x5的張量，torch。cat的結果是30x5的張量，

而torch。stack的結果是3x10x5的張量。

’‘’

tensor = torch。cat（list_of_tensors， dim=0）

tensor = torch。stack（list_of_tensors， dim=0）

將整數標籤轉為one-hot編碼

# pytorch的標記預設從0開始

tensor = torch。tensor（［0， 2， 1， 3］）

N = tensor。size（0）

num_classes = 4

one_hot = torch。zeros（N， num_classes）。long（）

one_hot。scatter_（dim=1， index=torch。unsqueeze（tensor， dim=1）， alt="[深度學習框架] PyTorch 常用程式碼段總結" data-isLoading="0" src="/static/img/blank.gif" data-src=torch。ones（N， num_classes）。long（））

得到非零元素

torch。nonzero（tensor） # index of non-zero elements

torch。nonzero（tensor==0） # index of zero elements

torch。nonzero（tensor）。size（0） # number of non-zero elements

torch。nonzero（tensor == 0）。size（0） # number of zero elements

判斷兩個張量相等

torch。allclose（tensor1， tensor2） # float tensor

torch。equal（tensor1， tensor2） # int tensor

張量擴充套件

# Expand tensor of shape 64*512 to shape 64*512*7*7。

tensor = torch。rand（64，512）

torch。reshape（tensor，（64， 512， 1， 1））。expand（64， 512， 7， 7）

矩陣乘法

# Matrix multiplcation：（m*n） * （n*p） * -> （m*p）。

result = torch。mm（tensor1， tensor2）

# Batch matrix multiplication：（b*m*n） * （b*n*p） -> （b*m*p）

result = torch。bmm（tensor1， tensor2）

# Element-wise multiplication。

result = tensor1 * tensor2

計算兩組資料之間的兩兩歐式距離

利用broadcast機制

dist = torch。sqrt（torch。sum（（X1［：，None，：］ - X2） ** 2， dim=2））

3。模型定義和操作

一個簡單兩層卷積網路的示例

# convolutional neural network （2 convolutional layers）

class ConvNet（nn。Module）：

def __init__（self， num_classes=10）：

super（ConvNet， self）。__init__（）

self。layer1 = nn。Sequential（

nn。Conv2d（1， 16， kernel_size=5， stride=1， padding=2），

nn。BatchNorm2d（16），

nn。ReLU（），

nn。MaxPool2d（kernel_size=2， stride=2））

self。layer2 = nn。Sequential（

nn。Conv2d（16， 32， kernel_size=5， stride=1， padding=2），

nn。BatchNorm2d（32），

nn。ReLU（），

nn。MaxPool2d（kernel_size=2， stride=2））

self。fc = nn。Linear（7*7*32， num_classes）

def forward（self， x）：

out = self。layer1（x）

out = self。layer2（out）

out = out。reshape（out。size（0）， -1）

out = self。fc（out）

return out

model = ConvNet（num_classes）。to（device）

卷積層的計算和展示可以用這個網站輔助。

雙線性匯合（bilinear pooling）

X = torch。reshape（N， D， H * W） # Assume X has shape N*D*H*W

X = torch。bmm（X， torch。transpose（X， 1， 2）） / （H * W） # Bilinear pooling

assert X。size（） == （N， D， D）

X = torch。reshape（X，（N， D * D））

X = torch。sign（X） * torch。sqrt（torch。abs（X） + 1e-5） # Signed-sqrt normalization

X = torch。nn。functional。normalize（X） # L2 normalization

多卡同步 BN（Batch normalization）

當使用 torch。nn。DataParallel 將程式碼執行在多張 GPU 卡上時，PyTorch 的 BN 層預設操作是各卡上資料獨立地計算均值和標準差，同步 BN 使用所有卡上的資料一起計算 BN 層的均值和標準差，緩解了當批次大小（batch size）比較小時對均值和標準差估計不準的情況，是在目標檢測等任務中一個有效的提升效能的技巧。

sync_bn = torch。nn。SyncBatchNorm（num_features， eps=1e-05， momentum=0。1， affine=True，

track_running_stats=True）

將已有網路的所有BN層改為同步BN層

def convertBNtoSyncBN（module， process_group=None）：

‘’‘Recursively replace all BN layers to SyncBN layer。

Args：

module［torch。nn。Module］。 Network

’‘’

if isinstance（module， torch。nn。modules。batchnorm。_BatchNorm）：

sync_bn = torch。nn。SyncBatchNorm（module。num_features， module。eps， module。momentum，

module。affine， module。track_running_stats， process_group）

sync_bn。running_mean = module。running_mean

sync_bn。running_var = module。running_var

if module。affine：

sync_bn。weight = module。weight。clone（）。detach（）

sync_bn。bias = module。bias。clone（）。detach（）

return sync_bn

else：

for name， child_module in module。named_children（）：

setattr（module， name） = convert_syncbn_model（child_module， process_group=process_group））

return module

類似 BN 滑動平均

如果要實現類似 BN 滑動平均的操作，在 forward 函式中要使用原地（inplace）操作給滑動平均賦值。

class BN（torch。nn。Module）

def __init__（self）：

。。。

self。register_buffer（‘running_mean’， torch。zeros（num_features））

def forward（self， X）：

。。。

self。running_mean += momentum * （current - self。running_mean）

計算模型整體引數量

num_parameters = sum（torch。numel（parameter） for parameter in model。parameters（））

檢視網路中的引數

可以透過model。state_dict（）或者model。named_parameters（）函式檢視現在的全部可訓練引數（包括透過繼承得到的父類中的引數）

params = list（model。named_parameters（））

（name， param） = params［28］

print（name）

print（param。grad）

print（‘————————————————————————-’）

（name2， param2） = params［29］

print（name2）

print（param2。grad）

print（‘——————————————————————————’）

（name1， param1） = params［30］

print（name1）

print（param1。grad）

模型視覺化（使用pytorchviz）

szagoruyko/pytorchviz

類似 Keras 的 model.summary() 輸出模型資訊（

使用pytorch-summary

）

sksq96/pytorch-summary

模型權重初始化

注意 model。modules（）和 model。children（）的區別：model。modules（）會迭代地遍歷模型的所有子層，而 model。children（）只會遍歷模型下的一層。

# Common practise for initialization。

for layer in model。modules（）：

if isinstance（layer， torch。nn。Conv2d）：

torch。nn。init。kaiming_normal_（layer。weight， mode=‘fan_out’，

nonlinearity=‘relu’）

if layer。bias is not None：

torch。nn。init。constant_（layer。bias， val=0。0）

elif isinstance（layer， torch。nn。BatchNorm2d）：

torch。nn。init。constant_（layer。weight， val=1。0）

torch。nn。init。constant_（layer。bias， val=0。0）

elif isinstance（layer， torch。nn。Linear）：

torch。nn。init。xavier_normal_（layer。weight）

if layer。bias is not None：

torch。nn。init。constant_（layer。bias， val=0。0）

# Initialization with given tensor。

layer。weight = torch。nn。Parameter（tensor）

提取模型中的某一層

modules（）會返回模型中所有模組的迭代器，它能夠訪問到最內層，比如self。layer1。conv1這個模組，還有一個與它們相對應的是name_children（）屬性以及named_modules（），這兩個不僅會返回模組的迭代器，還會返回網路層的名字。

# 取模型中的前兩層

new_model = nn。Sequential（*list（model。children（））［：2］

# 如果希望提取出模型中的所有卷積層，可以像下面這樣操作：

for layer in model。named_modules（）：

if isinstance（layer［1］，nn。Conv2d）：

conv_model。add_module（layer［0］，layer［1］）

部分層使用預訓練模型

注意如果儲存的模型是 torch。nn。DataParallel，則當前的模型也需要是

model。load_state_dict（torch。load（‘model。pth’）， strict=False）

將在 GPU 儲存的模型載入到 CPU

model。load_state_dict（torch。load（‘model。pth’， map_location=‘cpu’））

4. 資料處理

計算資料集的均值和標準差

import os

import cv2

import numpy as np

from torch。utils。data import Dataset

from PIL import Image

def compute_mean_and_std（dataset）：

# 輸入PyTorch的dataset，輸出均值和標準差

mean_r = 0

mean_g = 0

mean_b = 0

for img， _ in dataset：

img = np。asarray（img） # change PIL Image to numpy array

mean_b += np。mean（img［：，：， 0］）

mean_g += np。mean（img［：，：， 1］）

mean_r += np。mean（img［：，：， 2］）

mean_b /= len（dataset）

mean_g /= len（dataset）

mean_r /= len（dataset）

diff_r = 0

diff_g = 0

diff_b = 0

N = 0

for img， _ in dataset：

img = np。asarray（img）

diff_b += np。sum（np。power（img［：，：， 0］ - mean_b， 2））

diff_g += np。sum（np。power（img［：，：， 1］ - mean_g， 2））

diff_r += np。sum（np。power（img［：，：， 2］ - mean_r， 2））

N += np。prod（img［：，：， 0］。shape）

std_b = np。sqrt（diff_b / N）

std_g = np。sqrt（diff_g / N）

std_r = np。sqrt（diff_r / N）

mean = （mean_b。item（） / 255。0， mean_g。item（） / 255。0， mean_r。item（） / 255。0）

std = （std_b。item（） / 255。0， std_g。item（） / 255。0， std_r。item（） / 255。0）

return mean， std

得到影片資料基本資訊

import cv2

video = cv2。VideoCapture（mp4_path）

height = int（video。get（cv2。CAP_PROP_FRAME_HEIGHT））

width = int（video。get（cv2。CAP_PROP_FRAME_WIDTH））

num_frames = int（video。get（cv2。CAP_PROP_FRAME_COUNT））

fps = int（video。get（cv2。CAP_PROP_FPS））

video。release（）

TSN 每段（segment）取樣一幀影片

K = self。_num_segments

if is_train：

if num_frames > K：

# Random index for each segment。

frame_indices = torch。randint（

high=num_frames // K， size=（K，）， dtype=torch。long）

frame_indices += num_frames // K * torch。arange（K）

else：

frame_indices = torch。randint（

high=num_frames， size=（K - num_frames，）， dtype=torch。long）

frame_indices = torch。sort（torch。cat（（

torch。arange（num_frames）， frame_indices）））［0］

else：

if num_frames > K：

# Middle index for each segment。

frame_indices = num_frames / K // 2

frame_indices += num_frames // K * torch。arange（K）

else：

frame_indices = torch。sort（torch。cat（（

torch。arange（num_frames）， torch。arange（K - num_frames））））［0］

assert frame_indices。size（） == （K，）

return ［frame_indices［i］ for i in range（K）］

常用訓練和驗證資料預處理

其中 ToTensor 操作會將 PIL。Image 或形狀為 H×W×D，數值範圍為［0， 255］的 np。ndarray 轉換為形狀為 D×H×W，數值範圍為［0。0， 1。0］的 torch。Tensor。

train_transform = torchvision。transforms。Compose（［

torchvision。transforms。RandomResizedCrop（size=224，

scale=（0。08， 1。0）），

torchvision。transforms。RandomHorizontalFlip（），

torchvision。transforms。ToTensor（），

torchvision。transforms。Normalize（mean=（0。485， 0。456， 0。406），

std=（0。229， 0。224， 0。225）），

］）

val_transform = torchvision。transforms。Compose（［

torchvision。transforms。Resize（256），

torchvision。transforms。CenterCrop（224），

torchvision。transforms。ToTensor（），

torchvision。transforms。Normalize（mean=（0。485， 0。456， 0。406），

std=（0。229， 0。224， 0。225）），

］）

5。模型訓練和測試

分類模型訓練程式碼

# Loss and optimizer

criterion = nn。CrossEntropyLoss（）

optimizer = torch。optim。Adam（model。parameters（）， lr=learning_rate）

# Train the model

total_step = len（train_loader）

for epoch in range（num_epochs）：

for i ，（images， labels） in enumerate（train_loader）：

images = images。to（device）

labels = labels。to（device）

# Forward pass

outputs = model（images）

loss = criterion（outputs， labels）

# Backward and optimizer

optimizer。zero_grad（）

loss。backward（）

optimizer。step（）

if （i+1） % 100 == 0：

print（‘Epoch：［{}/{}］， Step：［{}/{}］， Loss： {}’

。format（epoch+1， num_epochs， i+1， total_step， loss。item（）））

分類模型測試程式碼

# Test the model

model。eval（） # eval mode（batch norm uses moving mean/variance

#instead of mini-batch mean/variance）

with torch。no_grad（）：

correct = 0

total = 0

for images， labels in test_loader：

images = images。to（device）

labels = labels。to（device）

outputs = model（images）

_， predicted = torch。max（outputs。data， 1）

total += labels。size（0）

correct += （predicted == labels）。sum（）。item（）

print（‘Test accuracy of the model on the 10000 test images： {} %’

。format（100 * correct / total））

自定義loss

繼承torch。nn。Module類寫自己的loss。

class MyLoss（torch。nn。Moudle）：

def __init__（self）：

super（MyLoss， self）。__init__（）

def forward（self， x， y）：

loss = torch。mean（（x - y） ** 2）

return loss

標籤平滑（label smoothing）

寫一個label_smoothing。py的檔案，然後在訓練程式碼裡引用，用LSR代替交叉熵損失即可。label_smoothing。py內容如下：

import torch

import torch。nn as nn

class LSR（nn。Module）：

def __init__（self， e=0。1， reduction=‘mean’）：

super（）。__init__（）

self。log_softmax = nn。LogSoftmax（dim=1）

self。e = e

self。reduction = reduction

def _one_hot（self， labels， classes， value=1）：

“”“

Convert labels to one hot vectors

Args：

labels： torch tensor in format ［label1， label2， label3，。。。］

classes： int， number of classes

value： label value in one hot vector， default to 1

Returns：

return one hot format labels in shape ［batchsize， classes］

”“”

one_hot = torch。zeros（labels。size（0）， classes）

#labels and value_added size must match

labels = labels。view（labels。size（0）， -1）

value_added = torch。Tensor（labels。size（0）， 1）。fill_（value）

value_added = value_added。to（labels。device）

one_hot = one_hot。to（labels。device）

one_hot。scatter_add_（1， labels， value_added）

return one_hot

def _smooth_label（self， target， length， smooth_factor）：

“”“convert targets to one-hot format， and smooth

them。

Args：

target： target in form with ［label1， label2， label_batchsize］

length： length of one-hot format（number of classes）

smooth_factor： smooth factor for label smooth

Returns：

smoothed labels in one hot format

”“”

one_hot = self。_one_hot（target， length， value=1 - smooth_factor）

one_hot += smooth_factor / （length - 1）

return one_hot。to（target。device）

def forward（self， x， target）：

if x。size（0）！= target。size（0）：

raise ValueError（‘Expected input batchsize （{}） to match target batch_size（{}）’

。format（x。size（0）， target。size（0）））

if x。dim（） < 2：

raise ValueError（‘Expected input tensor to have least 2 dimensions（got {}）’

。format（x。size（0）））

if x。dim（）！= 2：

raise ValueError（‘Only 2 dimension tensor are implemented，（got {}）’

。format（x。size（）））

smoothed_target = self。_smooth_label（target， x。size（1）， self。e）

x = self。log_softmax（x）

loss = torch。sum（- x * smoothed_target， dim=1）

if self。reduction == ‘none’：

return loss

elif self。reduction == ‘sum’：

return torch。sum（loss）

elif self。reduction == ‘mean’：

return torch。mean（loss）

else：

raise ValueError（‘unrecognized option， expect reduction to be one of none， mean， sum’）

或者直接在訓練檔案裡做label smoothing

for images， labels in train_loader：

images， labels = images。cuda（）， labels。cuda（）

N = labels。size（0）

# C is the number of classes。

smoothed_labels = torch。full（size=（N， C）， fill_value=0。1 / （C - 1））。cuda（）

smoothed_labels。scatter_（dim=1， index=torch。unsqueeze（labels， dim=1）， value=0。9）

score = model（images）

log_prob = torch。nn。functional。log_softmax（score， dim=1）

loss = -torch。sum（log_prob * smoothed_labels） / N

optimizer。zero_grad（）

loss。backward（）

optimizer。step（）

Mixup訓練

beta_distribution = torch。distributions。beta。Beta（alpha， alpha）

for images， labels in train_loader：

images， labels = images。cuda（）， labels。cuda（）

# Mixup images and labels。

lambda_ = beta_distribution。sample（［］）。item（）

index = torch。randperm（images。size（0））。cuda（）

mixed_images = lambda_ * images + （1 - lambda_） * images［index，：］

label_a， label_b = labels， labels［index］

# Mixup loss。

scores = model（mixed_images）

loss = （lambda_ * loss_function（scores， label_a）

+ （1 - lambda_） * loss_function（scores， label_b））

optimizer。zero_grad（）

loss。backward（）

optimizer。step（）

L1 正則化

l1_regularization = torch。nn。L1Loss（reduction=‘sum’）

loss = 。。。 # Standard cross-entropy loss

for param in model。parameters（）：

loss += torch。sum（torch。abs（param））

loss。backward（）

不對偏置項進行權重衰減（weight decay）

pytorch裡的weight decay相當於l2正則

bias_list = （param for name， param in model。named_parameters（） if name［-4：］ == ‘bias’）

others_list = （param for name， param in model。named_parameters（） if name［-4：］！= ‘bias’）

parameters = ［{‘parameters’： bias_list， ‘weight_decay’： 0}，

{‘parameters’： others_list}］

optimizer = torch。optim。SGD（parameters， lr=1e-2， momentum=0。9， weight_decay=1e-4）

梯度裁剪（gradient clipping）

torch。nn。utils。clip_grad_norm_（model。parameters（）， max_norm=20）

得到當前學習率

# If there is one global learning rate （which is the common case）。

lr = next（iter（optimizer。param_groups））［‘lr’］

# If there are multiple learning rates for different layers。

all_lr = ［］

for param_group in optimizer。param_groups：

all_lr。append（param_group［‘lr’］）

另一種方法，在一個batch訓練程式碼裡，當前的lr是optimizer。param_groups［0］［‘lr’］

學習率衰減

# Reduce learning rate when validation accuarcy plateau。

scheduler = torch。optim。lr_scheduler。ReduceLROnPlateau（optimizer， mode=‘max’， patience=5， verbose=True）

for t in range（0， 80）：

train（。。。）

val（。。。）

scheduler。step（val_acc）

# Cosine annealing learning rate。

scheduler = torch。optim。lr_scheduler。CosineAnnealingLR（optimizer， T_max=80）

# Reduce learning rate by 10 at given epochs。

scheduler = torch。optim。lr_scheduler。MultiStepLR（optimizer， milestones=［50， 70］， gamma=0。1）

for t in range（0， 80）：

scheduler。step（）

train（。。。）

val（。。。）

# Learning rate warmup by 10 epochs。

scheduler = torch。optim。lr_scheduler。LambdaLR（optimizer， lr_lambda=lambda t： t / 10）

for t in range（0， 10）：

scheduler。step（）

train（。。。）

val（。。。）

最佳化器鏈式更新

從1。4版本開始，torch。optim。lr_scheduler 支援鏈式更新（chaining），即使用者可以定義兩個 schedulers，並交替在訓練中使用。

import torch

from torch。optim import SGD

from torch。optim。lr_scheduler import ExponentialLR， StepLR

model = ［torch。nn。Parameter（torch。randn（2， 2， requires_grad=True））］

optimizer = SGD（model， 0。1）

scheduler1 = ExponentialLR（optimizer， gamma=0。9）

scheduler2 = StepLR（optimizer， step_size=3， gamma=0。1）

for epoch in range（4）：

print（epoch， scheduler2。get_last_lr（）［0］）

optimizer。step（）

scheduler1。step（）

scheduler2。step（）

模型訓練視覺化

PyTorch可以使用tensorboard來視覺化訓練過程。

安裝和執行TensorBoard。

pip install tensorboard

tensorboard ——logdir=runs

使用SummaryWriter類來收集和視覺化相應的資料，放了方便檢視，可以使用不同的資料夾，比如‘Loss/train’和‘Loss/test’。

from torch。utils。tensorboard import SummaryWriter

import numpy as np

writer = SummaryWriter（）

for n_iter in range（100）：

writer。add_scalar（‘Loss/train’， np。random。random（）， n_iter）

writer。add_scalar（‘Loss/test’， np。random。random（）， n_iter）

writer。add_scalar（‘Accuracy/train’， np。random。random（）， n_iter）

writer。add_scalar（‘Accuracy/test’， np。random。random（）， n_iter）

儲存與載入斷點

注意為了能夠恢復訓練，我們需要同時儲存模型和最佳化器的狀態，以及當前的訓練輪數。

start_epoch = 0

# Load checkpoint。

if resume： # resume為引數，第一次訓練時設為0，中斷再訓練時設為1

model_path = os。path。join（‘model’， ‘best_checkpoint。pth。tar’）

assert os。path。isfile（model_path）

checkpoint = torch。load（model_path）

best_acc = checkpoint［‘best_acc’］

start_epoch = checkpoint［‘epoch’］

model。load_state_dict（checkpoint［‘model’］）

optimizer。load_state_dict（checkpoint［‘optimizer’］）

print（‘Load checkpoint at epoch {}。’。format（start_epoch））

print（‘Best accuracy so far {}。’。format（best_acc））

# Train the model

for epoch in range（start_epoch， num_epochs）：

。。。

# Test the model

。。。

# save checkpoint

is_best = current_acc > best_acc

best_acc = max（current_acc， best_acc）

checkpoint = {

‘best_acc’： best_acc，

‘epoch’： epoch + 1，

‘model’： model。state_dict（），

‘optimizer’： optimizer。state_dict（），

}

model_path = os。path。join（‘model’， ‘checkpoint。pth。tar’）

best_model_path = os。path。join（‘model’， ‘best_checkpoint。pth。tar’）

torch。save（checkpoint， model_path）

if is_best：

shutil。copy（model_path， best_model_path）

提取 ImageNet 預訓練模型某層的卷積特徵

# VGG-16 relu5-3 feature。

model = torchvision。models。vgg16（pretrained=True）。features［：-1］

# VGG-16 pool5 feature。

model = torchvision。models。vgg16（pretrained=True）。features

# VGG-16 fc7 feature。

model = torchvision。models。vgg16（pretrained=True）

model。classifier = torch。nn。Sequential（*list（model。classifier。children（））［：-3］）

# ResNet GAP feature。

model = torchvision。models。resnet18（pretrained=True）

model = torch。nn。Sequential（collections。OrderedDict（

list（model。named_children（））［：-1］））

with torch。no_grad（）：

model。eval（）

conv_representation = model（image）

提取 ImageNet 預訓練模型多層的卷積特徵

class FeatureExtractor（torch。nn。Module）：

“”“Helper class to extract several convolution features from the given

pre-trained model。

Attributes：

_model， torch。nn。Module。

_layers_to_extract， list or set

Example：

>>> model = torchvision。models。resnet152（pretrained=True）

>>> model = torch。nn。Sequential（collections。OrderedDict（

list（model。named_children（））［：-1］））

>>> conv_representation = FeatureExtractor（

pretrained_model=model，

layers_to_extract={‘layer1’， ‘layer2’， ‘layer3’， ‘layer4’}）（image）

”“”

def __init__（self， pretrained_model， layers_to_extract）：

torch。nn。Module。__init__（self）

self。_model = pretrained_model

self。_model。eval（）

self。_layers_to_extract = set（layers_to_extract）

def forward（self， x）：

with torch。no_grad（）：

conv_representation = ［］

for name， layer in self。_model。named_children（）：

x = layer（x）

if name in self。_layers_to_extract：

conv_representation。append（x）

return conv_representation

微調全連線層

model = torchvision。models。resnet18（pretrained=True）

for param in model。parameters（）：

param。requires_grad = False

model。fc = nn。Linear（512， 100） # Replace the last fc layer

optimizer = torch。optim。SGD（model。fc。parameters（）， lr=1e-2， momentum=0。9， weight_decay=1e-4）

以較大學習率微調全連線層，較小學習率微調卷積層

model = torchvision。models。resnet18（pretrained=True）

finetuned_parameters = list（map（id， model。fc。parameters（）））

conv_parameters = （p for p in model。parameters（） if id（p） not in finetuned_parameters）

parameters = ［{‘params’： conv_parameters， ‘lr’： 1e-3}，

{‘params’： model。fc。parameters（）}］

optimizer = torch。optim。SGD（parameters， lr=1e-2， momentum=0。9， weight_decay=1e-4）

6。其他注意事項

不要使用太大的線性層。因為nn。Linear（m，n）使用的是

的記憶體，線性層太大很容易超出現有視訊記憶體。

不要在太長的序列上使用RNN。因為RNN反向傳播使用的是BPTT演算法，其需要的記憶體和輸入序列的長度呈線性關係。

model（x）前用 model。train（）和 model。eval（）切換網路狀態。

不需要計算梯度的程式碼塊用 with torch。no_grad（）包含起來。

model。eval（）和 torch。no_grad（）的區別在於，model。eval（）是將網路切換為測試狀態，例如 BN 和dropout在訓練和測試階段使用不同的計算方法。torch。no_grad（）是關閉 PyTorch 張量的自動求導機制，以減少儲存使用和加速計算，得到的結果無法進行 loss。backward（）。

model。zero_grad（）會把整個模型的引數的梯度都歸零，而optimizer。zero_grad（）只會把傳入其中的引數的梯度歸零。

torch。nn。CrossEntropyLoss 的輸入不需要經過 Softmax。torch。nn。CrossEntropyLoss 等價於 torch。nn。functional。log_softmax + torch。nn。NLLLoss。

loss。backward（）前用 optimizer。zero_grad（）清除累積梯度。

torch。utils。data。DataLoader 中儘量設定 pin_memory=True，對特別小的資料集如 MNIST 設定 pin_memory=False 反而更快一些。num_workers 的設定需要在實驗中找到最快的取值。

用 del 及時刪除不用的中間變數，節約 GPU 儲存。

使用 inplace 操作可節約 GPU 儲存，如

x = torch。nn。functional。relu（x， inplace=True）

減少 CPU 和 GPU 之間的資料傳輸。例如如果你想知道一個 epoch 中每個 mini-batch 的 loss 和準確率，先將它們累積在 GPU 中等一個 epoch 結束之後一起傳輸回 CPU 會比每個 mini-batch 都進行一次 GPU 到 CPU 的傳輸更快。

使用半精度浮點數 half（）會有一定的速度提升，具體效率依賴於 GPU 型號。需要小心數值精度過低帶來的穩定性問題。

時常使用 assert tensor。size（） == （N， D， H， W）作為除錯手段，確保張量維度和你設想中一致。

除了標記 y 外，儘量少使用一維張量，使用 n*1 的二維張量代替，可以避免一些意想不到的一維張量計算結果。

統計程式碼各部分耗時

with torch。autograd。profiler。profile（enabled=True， use_cuda=False） as profile：

。。。

print（profile）

或者在命令列執行

python -m torch。utils。bottleneck main。py

使用TorchSnooper來除錯PyTorch程式碼，程式在執行的時候，就會自動 print 出來每一行的執行結果的 tensor 的形狀、資料型別、裝置、是否需要梯度的資訊。

pip install torchsnooper

import torchsnooper

對於函式，使用修飾器

@torchsnooper。snoop（）

如果不是函式，使用 with 語句來啟用 TorchSnooper，把訓練的那個迴圈裝進 with 語句中去。

with torchsnooper。snoop（）：

原本的程式碼

https：//

github。com/zasdfgbnm/To

rchSnooper

模型可解釋性，使用captum庫

https：//

captum。ai/

參考資料：

張皓：PyTorch Cookbook（常用程式碼段整理合集）

PyTorch官方文件和示例

https：//

pytorch。org/docs/stable

/notes/faq。html

https：//

github。com/szagoruyko/p

ytorchviz

https：//

github。com/sksq96/pytor

ch-summary

其他

相關文章：

pytorch 多 gpu 並行訓練

PyTorch 進階之路（一）：張量與梯度

面向 CIFAR 的 CNN 模型文獻 /PyTorch 實現集錦

標簽： torch Model tensor nn self

上一篇:為什麼國產音遊大多不用流行音樂？

下一篇：乾貨 | 輸卵管堵塞的症狀和治療方法

[深度學習框架] PyTorch 常用程式碼段總結

猜你喜歡

Pytorch中的5個非常有用的張量操作

番外篇：GCN實現子圖匹配

PyTorch載入預訓練模型小結

pytorch模型結構視覺化，可顯示每層的尺寸

PyTorch為何如此高效好用？來探尋深度學習框架的內部架構