【前置技能】GPGPU實時2D流體的Unity實現

作者：由破曉發表于旅遊時間：2020-01-02

前情提要：

在前兩篇專欄中，分別介紹了流體動力學的基礎背景知識（儘管不太完善）和Jos的一篇論文。但還是不足以開始編寫程式碼，譬如說，梯度、散度等數學計算在Unity中是如何體現的；二維情況下的擴散，是如何迭代……這些如霧中看花，還需要進一步琢磨。

透過《GPU Gems 1》Chapter 38的內容，我們可以將上述細節明確，並在Unity中實現自己的流體。

運算元的有限微分形式

$\nabla$

分別應用於梯度、散度和拉普拉斯運算元中，如下表所示，下標的數字

和

代表笛卡爾座標系中離散的位置，

$\delta x$

和

$\delta y$

分別是

和

方向上的網格間距。

當然，實現中網格間距都是一致的，所以基於這個假設，我們可以將有限微分形式進一步簡化，如拉普拉斯運算元可以被簡化為：

亥姆霍茲分解

在之前的專欄中，我們已經提過，任何向量場

$\bold w$

可以被分解為其他兩個向量場之和：一個無散度向量場

$\bold u$

，一個標量場的梯度

$\nabla p$

：

$\bold u = \bold w -\nabla p\\$

該定理引出一個計算壓力場的方法，將散度運算元應用到兩頭，可簡化成如下形式：

$\nabla \cdot \bold w=\nabla \cdot(\bold u + \nabla p)=\nabla\cdot\bold u+\nabla^2p\\ \nabla \cdot\bold u = 0\\ \nabla\cdot\bold u+\nabla^2p=\nabla \cdot \bold w$

$\nabla^2x=b$

的形式被稱為泊松（Poisson）方程，當

就是拉普拉斯方程的形式，即，拉普拉斯運算元的起源。解流體的泊松方程，我們就有了標量場

，隨後帶入回原式即可計算出無散度的速度場

$\bold u$

。這裡從亥姆霍茲分解中定義一個投射運算子

，該運算子用來把一個向量場投影到其無散度分量上：

$proj(\bold w)=proj(\bold u)+proj(\nabla p)\\$

根據定義，一個無散向量場的投影應當與自身相等，故

$proj(\bold w)=proj(\bold u)=\bold u$

，因此

$proj(\nabla p)=0$

。將

運算元用到N-S方程中：

$proj\frac{\partial\bold u}{\partial t}=\frac{\partial\bold u}{\partial t}\\=proj(-(\bold u\cdot \nabla)\bold u-\frac1\rho\nabla p+v\nabla^2\bold u+\bold F)$

因為

$\bold u$

無散，故導數也無散；同時

$proj(\nabla p)=0$

所以捨去壓力項。現在我們有了如下的式子：

$\frac{\partial\bold u}{\partial t}=proj(-(\bold u\cdot \nabla)\bold u+v\nabla^2\bold u+\bold F)\\$

即，對流、擴散、外力作用。

在實現中，我們會從左到右計算各個分量，每個分量都是一個步驟而不是一個因子，每個步驟取一個場作為輸入，而用一個新的場做為輸出。

對流

流體的物理引數被速度傳遞，被稱為對流（Advect）。如前文所述，使用從下一個步長考慮的方法，計算經過一段時間後恰好位於網格中心的點。令這個物理引數為

（密度、速度、溫度等），那麼方程為：

$q(x,t+\delta t)=q(x-\bold u(x,t)\delta t, t)\\$

擴散

液體的黏性導致液體的速度發生了擴散，該偏微分方程是

$\frac{\partial \bold u}{\partial t}=v\nabla ^ 2\bold u$

。同上所述，我們為了穩定性考慮，免得處理時間步長或者速度過大時系統不穩定，則使用如下形式：

$\bold I$

是單位矩陣。有了速度和壓力的泊松方程，我們可以用雅可比迭代去漸進求解，兩個泊松方程可以重寫為以下離散形式：

此處的

$\alpha$

和

$\beta$

為常數。對於壓力方程式而言，

$\alpha=-(\delta x)^2,\beta = 4$

，

就是速度場的散度；對於黏度擴散方程式而言，

$\alpha=(\delta x)^2/v\delta t,\beta = 4+\alpha$

，

就是速度場。

理論總結

花費了三篇專欄的文章之後，相信諸位已經對流體力學有了一個“淺薄”的瞭解（當然“淺薄”意思是，要正確學習還得看教材），這樣一個理解已經足夠看懂論文的基礎和實現了。下面是虛擬碼，其中

和

別代表速度場和壓力場。

這樣說，還是太籠統了一點，我個人將它劃分為以下步驟：

對流——物理量被速度場搬運

擴散——黏度使得速度場發生了擴散，流體彼此分離

外力

投射——計算速度場的散度

投射——用速度場的散度作為

，解壓力泊松方程求壓力場

投射——從速度場中減去壓力場的梯度

邊界條件——特殊處理網格的邊緣，保證速度場不會“逃逸”

搬運——用計算出的速度場對流體進行搬運

展示

我要重點提及第一個我個人出現了思考疏漏的盲點，這直接導致我寫的第一版程式碼就是一坨shit，因此延期了這麼久——

上述描述的2D流體，是充盈於整個網格的

。

打個淺顯的比方，這個東西就跟咖啡表面的拉花一樣，作為流體輸入的圖案，只是標誌流體變化的視覺化輔助。我卻沒有意識到，花了很多時間去想，cnm，圖案上這地方密度高，那個地方密度低，怎麼計算擴散。狗屎，新年新廢物，絕了。

準備紋理

首先，我們要準備好紋理，因為既然要運用到GPU進行平行計算，輸出到紋理是最方便的。對於上述描述的每個步驟，都至少準備一個輸入紋理和一個輸出紋理。那麼要幾個紋理呢？

七個，其中兩個是標量場壓力場的紋理，三個是向量場速度場的紋理，剩下兩個當然是用作液體資料的輸入輸出了。之所以速度場需要三個，是因為在擴散步驟中，矩陣

也是一個紋理。

對於標量場，只需要一個通道，對於速度場，則需要兩個通道。我這裡使用一個靜態類來封裝所有的RenderTexture的申請和銷燬操作。

static

class

{

public

static

RenderTexture

，

；

public

static

RenderTexture

，

；

public

static

void

Init

（

int

width

，

int

height

）

{

CreateTexture

（

，

width

，

height

）；

CreateTexture

（

，

width

，

height

）；

CreateTexture

（

，

width

，

height

）；

CreateTexture

（

，

width

，

height

）；

CreateTexture

（

，

width

，

height

）；

}

public

static

RenderTexture

CreateTexture

（

int

component

，

int

width

，

int

height

）

{

var

format

RenderTextureFormat

。

ARGBHalf

；

（

component

）

format

RenderTextureFormat

。

RGHalf

；

（

component

）

format

RenderTextureFormat

。

RHalf

；

var

result

new

RenderTexture

（

width

，

height

，

format

）；

result

。

enableRandomWrite

true

；

result

。

Create

（）；

return

result

；

}

public

static

void

Dispose

（）

{

Destroy

（

）；

Destroy

（

）；

Destroy

（

）；

Destroy

（

）；

Destroy

（

）；

}

這裡，液體資料紋理和運算用的紋理相比尺寸要更大，因為這裡要把螢幕看到的內容像後處理一樣攪動，不過要保證Aspect完全一致。

這裡我又犯了第二個錯，幾乎忽視了Aspect的存在，導致最開始怎麼除錯，速度場在X軸和Y軸方向的運動都對不上。對比了Keijiro的實現才發現問題。

［SerializeField］

Size

_resolution

Size

。

_512

；

［SerializeField］

ComputeShader

_compute

；

int

ResolutionX

{

get

{

return

ThreadCountX

；

}

int

ResolutionY

{

get

{

return

ThreadCountY

；

}

int

ThreadCountX

{

get

{

return

（（

int

）

_resolution

）

；

}

int

ThreadCountY

{

get

{

return

（（

int

）

_resolution

Screen

。

height

Screen

。

width

）

；

}

RenderTexture

，

；

private

void

Awake

（）

{

。

Init

（

ResolutionX

，

ResolutionY

）；

。

CreateTexture

（

，

Screen

。

width

，

Screen

。

height

）；

。

CreateTexture

（

，

Screen

。

width

，

Screen

。

height

）；

}

private

void

OnDestroy

（）

{

。

Dispose

（）；

}

然後稍微把框架流程搭建起來：

private

void

Update

（）

{

// common argument

Time

。

deltaTime

；

ResolutionY

；

_compute

。

SetFloat

（

“dx”

，

）；

_compute

。

SetFloat

（

“dt”

，

）；

Advect

（）；

Diffuse

（）；

AddForce

（）；

ComputePressure

（）；

SubtractPressureGradient

（）；

MoveFluidWithVelocityField

（）；

}

剩下的事情，就是按照剛開始描述的流程，開始往裡頭填了。這裡又犯了一個弱智錯誤，之前沒有考慮到aspect，導致x軸上的流動永遠慢於y軸上的流動，參考了大佬的實現後才豁然開朗……

對流

找到格子中心，計算dt前的位置，一氣呵成。

//CSharp code：

private void Advect（）

{

_compute。SetTexture（Kernel。Advect， “U_in”， RT。u1）；

_compute。SetTexture（Kernel。Advect， “U_out”， RT。u2）；

_compute。Dispatch（Kernel。Advect， ThreadCountX， ThreadCountY， 1）；

// remember to swap rt

var t = RT。u2；

RT。u2 = RT。u1；

RT。u1 = t；

}

//compute shader code：

// 自對流，速度被速度場搬運

void Advect（uint2 tid ： SV_DispatchThreadID）

{

uint2 dim；

U_out。GetDimensions（dim。x， dim。y）；

// 找到格子中心

float2 uv = （tid + 0。5） / dim；

// 移動的距離：此處的速度場*時間步長，因為是用uv計算所以要考慮aspect

float2 duv = U_in［tid］ * float2（（float）dim。y / dim。x， 1） * dt；

U_out［tid］ = U_in。SampleLevel（samplerU_in， uv - duv， 0）；

}

擴散

直接按照前文所述，雅可比迭代40次完事兒，順便還免掉了翻轉地址。這裡走了第三個彎路：壓力場和速度場通道數不一樣，要開兩個Kernel來做……天真的我當時以為1個足矣……

//CSharp code：

private void Diffuse（）

{

var alpha = dx * dx / （Viscosity * dt）；

var beta = alpha + 4；

Graphics。CopyTexture（RT。u1， RT。u3）；

_compute。SetFloat（“Alpha”， alpha）；

_compute。SetFloat（“Beta”， beta）；

_compute。SetTexture（Kernel。Jacobi2， “B2_in”， RT。u3）；

for （var i = 0； i < 20； i++）

{

_compute。SetTexture（Kernel。Jacobi2， “X2_in”， RT。u1）；

_compute。SetTexture（Kernel。Jacobi2， “X2_out”， RT。u2）；

_compute。Dispatch（Kernel。Jacobi2， ThreadCountX， ThreadCountY， 1）；

_compute。SetTexture（Kernel。Jacobi2， “X2_in”， RT。u2）；

_compute。SetTexture（Kernel。Jacobi2， “X2_out”， RT。u1）；

_compute。Dispatch（Kernel。Jacobi2， ThreadCountX， ThreadCountY， 1）；

}

// compute shader code：

// Jacobi method with a vector field

［numthreads（8， 8， 1）］

void Jacobi2（uint2 tid ： SV_DispatchThreadID）

{

X2_out［tid］ = （X2_in［tid - int2（1， 0）］ + X2_in［tid + int2（1， 0）］ +

X2_in［tid - int2（0， 1）］ + X2_in［tid + int2（0， 1）］ + Alpha * B2_in［tid］） / Beta；

}

外力

這裡的外力我用了另一個方法去相對直觀的控制力度和影響範圍：

// CSharp code：

Vector2 lastInput = Vector2。zero；

private void AddForce（）

{

// 為了方便以後改3D體渲染，本來用一個Vector4可以表達的資訊拆成兩個

Vector4 dir = Vector4。zero， position = Vector4。zero；

Vector2 curInput = Input。mousePosition；

curInput = curInput - new Vector2（Screen。width， Screen。height） * 0。5f；

curInput = curInput / Screen。height；

// curInput。x in ［-0。5 * aspect， 0。5 * aspect］

// curInput。y in ［-0。5， 0。5］

// 滑鼠按下，計算力的方向

if （Input。GetMouseButton（0））

{

// w is force strength

position = curInput；

dir = curInput - lastInput；

dir。w = ForceAmount * dir。magnitude；

}

lastInput = curInput；

_compute。SetTexture（Kernel。Force， “U_in”， RT。u1）；

_compute。SetTexture（Kernel。Force， “U_out”， RT。u2）；

_compute。SetVector（“ForceDirection”， dir）；

_compute。SetVector（“ForcePosition”， position）；

_compute。SetFloat（“ForceRadius”， ForceRadius）；

_compute。SetTexture（Kernel。Force， “U_in”， RT。u1）；

_compute。SetTexture（Kernel。Force， “U_out”， RT。u2）；

_compute。Dispatch（Kernel。Force， ThreadCountX， ThreadCountY， 1）；

// remember to swap rt

var t = RT。u2；

RT。u2 = RT。u1；

RT。u1 = t；

}

//compute shader code：

［numthreads（8， 8， 1）］

void Force（uint2 tid ： SV_DispatchThreadID）

{

if （ForceDirection。w == 0） return；

uint2 dim；

U_out。GetDimensions（dim。x， dim。y）；

// 計算c值

float2 pos = （tid + 0。5 - dim * 0。5） / dim。y；

// pos。x in ［-0。5 * aspect， 0。5 * aspect］

// pos。y in ［-0。5， 0。5］

float strength = ForceDirection。w * （1 - saturate（distance（ForcePosition。xy， pos） / ForceRadius））；

//ForceDirection。w * exp（distance（ForcePosition。xy， pos） / ForceRadius）； ——keijiro

//exp（-ForceDirection。w * distance（ForcePosition。xy， pos））； ——- GPU Gems 1

U_out［tid］ = U_in［tid］ + strength * ForceDirection。xy；

}

求速度場的散度

值得注意的是，要單獨初始化壓力場，因為程式中對壓力場的應用僅在下一階段使用一次，並沒有翻轉覆寫之類的操作，所以一定要先清空壓力場的資料。

// csharp code

private void ComputePressure（）

{

// 計算速度場的散度，初始化壓力場

_compute。SetTexture（Kernel。Divergence， “U_in”， RT。u1）；

_compute。SetTexture（Kernel。Divergence， “U_out”， RT。u3）；

_compute。SetTexture（Kernel。Divergence， “P_out”， RT。p1）；

_compute。Dispatch（Kernel。Divergence， ThreadCountX， ThreadCountY， 1）；

}

//compute shader code

［numthreads（8， 8， 1）］

void Divergence（uint2 tid ： SV_DispatchThreadID）

{

uint2 dim；

U_out。GetDimensions（dim。x， dim。y）；

U_out［tid］ = （U_in［tid + int2（1， 0）］。x - U_in［tid - int2（1， 0）］。x +

U_in［tid + int2（0， 1）］。y - U_in［tid - int2（0， 1）］。y） * 0。5 * dim。y；

P_out［tid］ = 0；

}

求解壓力場

這裡使用的是單通道版本的雅可比迭代。切莫跟速度擴散時候的那個迭代搞混了……

//csharp code：

private void SubtractPressureGradient（）

{

// 將散度作為b傳入計算，雅可比解泊松方程，求出壓力場

var alpha = -dx * dx；

var beta = 4；

_compute。SetFloat（“Alpha”， alpha）；

_compute。SetFloat（“Beta”， beta）；

_compute。SetTexture（Kernel。Jacobi1， “B1_in”， RT。u3）；

for （var i = 0； i < 20； i++）

{

_compute。SetTexture（Kernel。Jacobi1， “X1_in”， RT。p1）；

_compute。SetTexture（Kernel。Jacobi1， “X1_out”， RT。p2）；

_compute。Dispatch（Kernel。Jacobi1， ThreadCountX， ThreadCountY， 1）；

_compute。SetTexture（Kernel。Jacobi1， “X1_in”， RT。p2）；

_compute。SetTexture（Kernel。Jacobi1， “X1_out”， RT。p1）；

_compute。Dispatch（Kernel。Jacobi1， ThreadCountX， ThreadCountY， 1）；

減去梯度場完成投射

//csharp code

// 速度場減壓力場的梯度

_compute。SetTexture（Kernel。Subtract， “U_in”， RT。u1）；

_compute。SetTexture（Kernel。Subtract， “U_out”， RT。u2）；

_compute。SetTexture（Kernel。Subtract， “P_in”， RT。p1）；

_compute。Dispatch（Kernel。Subtract， ThreadCountX， ThreadCountY， 1）；

// remember to swap rt

var t = RT。u2；

RT。u2 = RT。u1；

RT。u1 = t；

}

//compute shader

［numthreads（8， 8， 1）］

void Subtract（uint2 tid ： SV_DispatchThreadID）

{

// 包含邊界條件

uint2 dim；

U_out。GetDimensions（dim。x， dim。y）；

if （any（tid == 0） || any（tid == dim - 1）） return；

float P1 = P_in［max（tid - int2（1， 0）， 1）］；

float P2 = P_in［min（tid + int2（1， 0）， dim - 2）］；

float P3 = P_in［max（tid - int2（0， 1）， 1）］；

float P4 = P_in［min（tid + int2（0， 1）， dim - 2）］；

float2 u = U_in［tid］ - float2（P2 - P1， P4 - P3） * dim。y * 0。5；

U_out［tid］ = u；

// 為了保證邊界處的法線分量為0，將邊界處的速度反轉

。。。。

}

邊界條件

邊界條件的重點，就是保證流體不逃逸，就是之前專欄提到的五種邊界條件中最簡單常見的那種固態邊界。

固態邊界之所以能保證流體不逃逸，實質上是保證流體在邊界處的法向分量為零，也就是說，只能水平流動。假定

處的網格格子是邊界，那麼為了保證

與

交界處的流體法向分量為零，邊界處網格的速度與

網格的速度（畢竟速度是在網格中心採集的）之和為零。因此我們翻轉速度即可……

// 為了保證邊界處的法線分量為0，將邊界處的速度反轉

if （tid。x == 1） U_out［int2（0， tid。y）］ = -u；

if （tid。y == 1） U_out［int2（tid。x， 0）］ = -u；

if （tid。x == dim。x - 2） U_out［int2（dim。x - 1， tid。y）］ = -u；

if （tid。y == dim。y - 2） U_out［int2（tid。x， dim。y - 1）］ = -u；

用速度場搬運流體

這時候就直接將速度場和流體圖案作為傳給傳統的片元著色器，如同對流那樣操作就完事了：

sampler2D _MainTex；

float4 _MainTex_TexelSize；

sampler2D _VelocityField；

float2 _ForceOrigin；

float _ForceExponent；

half4 frag_advect（v2f_img i）： SV_Target

{

float time = _Time。y；

// 原來不需要專門傳dt進來，直接從系統變數裡拿

float dt = unity_DeltaTime。x；

float aspect = _MainTex_TexelSize。y * _MainTex_TexelSize。z；

float invAspect = 1 / aspect；

float2 velocity = tex2D（_VelocityField， i。uv）。xy；

float2 dist = velocity * dt；

// 要考慮aspect的啊啊啊啊啊啊啊啊啊啊

dist。x *= aspect；

return tex2D（_MainTex， i。uv - dist）；

}

完整程式碼實現，請參考Keijiro大神的實現，我的，害，拿不出手，看看註釋就行了：

效果演示

https：//www。zhihu。com/video/1195884791218905088

https：//www。zhihu。com/video/1195884688303267840

上述兩個影片表示了流體的運動和該情況下速度場的變化。下面展示兩個理論中著重提到的內容：速度場無散和邊界條件。

因為速度場的無散性質而模擬出來的渦流

設定邊界條件後，因為法線分量為零，因此邊緣的物質被迫在水平方向上發生巨大位移形成的變形

總結

本文走過的彎路有三：

對流體的理解發生了本質性的誤會

忘記了Aspect對計算的影響

以為雅可比迭代法對通道數沒有要求

本文簡單的實現了《GPU精粹1》中第三十八章《GPU上的快速流體動力學模擬》，得益於參考Keijiro的實現，幫助糾正了很多弱智的彎路。透過大量的註釋，和對比論文的實現，希望閱讀本文的讀者們不要跟我一樣睿智。

標簽： compute RT TID kernel dim

上一篇:同是意識流代表作家，普魯斯特，喬伊斯，伍爾夫為什麼沒獲諾貝爾文學獎，福克納卻獲得了？

下一篇：東海有多好玩？嚮往的生活就在身邊

【前置技能】GPGPU實時2D流體的Unity實現

猜你喜歡

Unity渲染繪製介面（三）Graphics

強化學習傳說：第一章模仿學習

別再無聊地吹捧了，一起來動手實現 MAE(Masked Autoencoders Are Scalable Vision Learners) 玩玩吧！

Adidas tubular系列到底貴在哪裡？ - 知乎

華碩RT-AX86U 和小米AX6000，兩款WiFi6路由具體差異在哪兒？

【前置技能】GPGPU實時2D流體的Unity實現

猜你喜歡

Unity渲染繪製介面（三）Graphics

強化學習傳說：第一章 模仿學習

別再無聊地吹捧了，一起來動手實現 MAE(Masked Autoencoders Are Scalable Vision Learners) 玩玩吧！

Adidas tubular系列到底貴在哪裡？ - 知乎

華碩RT-AX86U 和小米AX6000，兩款WiFi6路由具體差異在哪兒？

強化學習傳說：第一章模仿學習