利用 PyTorch 和 CUDA 建立多變數類神經網路模型

發佈於程式

更新於 2024/11/14發佈於 2024/11/14閱讀時間約 15 分鐘

在真實世界中，大多數需要利用人工智慧處理的問題，都不會是單純的單變數問題。透過 PyToch 以及 CUDA 來使用多核 CPU 最主要的最大好處之一，是可以透過平行加速運算的能力，來處理複雜的多變數系統；我們可以利用類神經網路來建立多變數模型，瞭解各種情況下，系統的變化及反應，進一步找到方法使得我們面對的複雜系統達到我們所希望的最佳的效能。

模型學習目標

在許多研究中，「Himmelblau function」是被拿來當作多變數模型的建立的典型例子。它是一個二維的函數，總共有四個極點。程式可以如下所表示；在這裏，我們利用「numpy.meshgrid()」函數簡化產生資料的過程。

import numpy as np
import matplotlib.pyplot as plt
#-----------
def himmFun(x,y):
    return (x**2+y-11)**2+(x+y**2-7)**2
#-----------
points=np.arange(-6,6,0.1)
nPoint=len(points)
x1,x2=np.meshgrid(points,points)
y=himmFun(x1,x2)
#-----------
fig = plt.figure(figsize=(9,4))

ax1 = fig.add_subplot(121, projection='3d')
plt.title('surface')
ax1.view_init(45,30)
ax1.plot_surface(x1,x2,y,cmap='gist_rainbow')

ax2 = fig.add_subplot(122)
plt.title('contour')
ax2.contour(x1,x2,y,100,cmap='gist_rainbow')
plt.show()

資料正規化

從上面的資料圖形當中，可以看到 X/Y/Z 軸變數的資料的範圍其實相當地分散；在類神經網路模型的建立上，我們通常會 X/Y/Z 軸變數的資料分佈儘量接近；這個過程一般叫做「㠪規化」；例如以下的程式，將變數資料分佈以零值為中心作數值正規化。

#----------
# normalized data
#----------
x_1=(x1-np.mean(x1))/(np.max(x1)-np.min(x1))
x_2=(x2-np.mean(x2))/(np.max(x2)-np.min(x2))
y_0=(y-np.mean(y))/(np.max(y)-np.min(y))

建立類神經網路類別

接下來，按照原來的反饋類神經網路建立類神經網路類別。

#----------------
# create neural network model
#----------------
import torch
from torch import nn
#---------
# create neural network model
#---------
class classNeural(nn.Module):
    def __init__(self,n_input,n_hidden,n_output):
        super().__init__()
        self.n_input=n_input
        self.n_hidden=n_hidden
        self.n_output=n_output
        #--------
        self.layer1=nn.Linear(n_input,n_hidden)
        self.layer2=nn.Linear(n_hidden,n_output)
        self.active=nn.Sigmoid()
        #--------
    def forward(self,x):
        x=self.active(self.layer1(x))
        return self.layer2(x)

訓練資料型態轉換

由於「Himmelblau function」有兩個自變數(x1, x2)，每個自變數資料又都是透過「meshgrid()」函數產生的二維陣列的形式；因此，在訓練類神經網路之前，我們要將這自變數的二維矩陣形態，透過「reshape()」變成單維矩陣形態；然後再轉換成「torch.tensor」型態，以方便訓練類神經網路。

要注意，在 PyTorch 的使用上，訓練資料的第一個維數指的是訓練資料數的指標，而第二個維度則是不同變數的數目。

import torch
#--------
device=torch.device('cpu')
if torch.cuda.is_available():
    device=torch.device('cuda')
#--------
nData=len(y_0)**2
x=np.zeros((nData,2))
x[:,0]=x_1.reshape(nData,1)[:,0]
x[:,1]=x_2.reshape(nData,1)[:,0]
y=y_0.reshape(nData,1)
X_train=torch.tensor(x.astype('float32')).to(device)
Y_train=torch.tensor(y.astype('float32')).to(device)

在 GPU 進行類神經網路訓練

這一段的訓練類神經網路，與之前的單變數訓練程式的寫法是一樣的，而且因為訓練資料以及類神經網路變數都在 GPU 上，所以整個運算也就會在 GPU 上面執行。

#---------
# training
#---------
torch.manual_seed(13)
neural=classNeural(2,10,1).to(device)
#---------
# set training condition
#---------
loss_fn=nn.MSELoss() # MSE
optimizer=torch.optim.AdamW(neural.parameters(),lr=0.01)
neural.train()
n_epoche=3000
mae_x,mae_y=[],[]
for epoche in range(n_epoche):
    Y_pred=neural(X_train)
    loss=loss_fn(Y_pred,Y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

輸出在訓練好的模型的預測數值

最後，在輸出訓練好的類神經網路模型進行預測值的輸出。要注意的是所有的輸出到人機界面的數值，都必須要透過函數「to(‘cpu’)」由 GPU 搬到 CPU 上，才能進行。

#------------
# prediction
#------------
neural.eval()
with torch.inference_mode():
    Y_pred=neural(X_train)
#----------------
# output 結果轉換到 CPU 的 numpy 型態
#----------------
x_train=X_train.to('cpu').numpy()
x1_train=x_train[:,0].reshape(nPoint,nPoint)
x2_train=x_train[:,1].reshape(nPoint,nPoint)

y_train=Y_train.to('cpu').numpy().reshape(nPoint,nPoint)
y_pred=Y_pred.to("cpu").numpy().reshape(nPoint,nPoint)
#----------------
# 3D/contour 繪圖
#———————————————
fig = plt.figure(figsize=(9,4))

ax1 = fig.add_subplot(121, projection='3d')
plt.title('surface')
ax1.plot_wireframe(x1_train,x2_train,y_pred,color='blue',rstride=5, cstride=5,label='predicton')
ax1.plot_wireframe(x1_train,x2_train,y_train,color='red',linestyle='dotted',rstride=5, cstride=5,label='target')
plt.legend()

ax2 = fig.add_subplot(122)
plt.title('prediction contour')
ax2.contour(x1_train,x2_train,y_pred,100,cmap='gist_rainbow')
plt.show()

觀察多變數類神經網路的訓練收歛情況

在執行類神經網路訓練時，觀察訓練誤差值的變化，我們可以看到有三段不同的收歛區間。我們可以看看在不同訓練疊代次數時，預測的資料所繪出不同的模式圖形；可以更瞭解多變數類神經網路模型的變化如下。

runs=[10,1000,1500,3000]
n_Run=len(runs)
for i in range(n_Run):
    n_epoche=runs[i]
    torch.manual_seed(13)
    neural=classNeural(2,10,1).to(device)
    #---------
    # training
    #---------
    loss_fn=nn.MSELoss() # MSE
    optimizer=torch.optim.AdamW(neural.parameters(),lr=0.01)
    neural.train()
    for epoche in range(n_epoche):
        Y_pred=neural(X_train)
        loss=loss_fn(Y_pred,Y_train)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    #----------
    # predicton
    #----------
    neural.eval()
    with torch.inference_mode():
        Y_pred=neural(X_train)
    #----------
    # handle data from CUDA tensor
    #----------    
    x_train=X_train.to('cpu').numpy()
    x1_train=x_train[:,0].reshape(nPoint,nPoint)
    x2_train=x_train[:,1].reshape(nPoint,nPoint)

    y_train=Y_train.to('cpu').numpy().reshape(nPoint,nPoint)
    y_pred=Y_pred.to("cpu").numpy().reshape(nPoint,nPoint)
    #----------
    # plot
    #----------    
    fig = plt.figure(figsize=(9,4*n_Run))
    frame1=((i+1)*2-1)+20+n_Run*100
    frame2=(i+1)*2+20+n_Run*100
    
    ax1 = fig.add_subplot(frame1, projection='3d')
    plt.title('iteration count:'+str(runs[i]))
    ax1.plot_wireframe(x1_train,x2_train,y_pred,color='blue',rstride=5, cstride=5,label='predicton')
    ax1.plot_wireframe(x1_train,x2_train,y_train,color='red',linestyle='dotted',rstride=5, cstride=5,label='target')
    plt.legend()

    ax2 = fig.add_subplot(frame2)
    plt.title('prediction contour')
    ax2.contour(x1_train,x2_train,y_pred,100,cmap='gist_rainbow')
plt.show()