如何專業地分析GPU VRAM使用: Nsight 快速上手

ECOE

發佈於資訊、程式

更新於 2024/12/26發佈於 2024/11/15閱讀時間約 5 分鐘

(詳細API參考: Advanced Reference)

使用GPU是AI研究、工程中不可或缺的一部份， VRAM 使用對於深度學習和高性能計算領域至關重要。有效分析和管理 VRAM 使用能幫助我們提升模型效率，避免不必要的記憶體(內存)溢出問題。

本文將帶你快速上手 NVIDIA 的 Nsight System 工具，並結合 PyTorch 框架，深入了解如何專業地分析 GPU VRAM 使用。

第一步：安裝 Nsight 工具

NVIDIA 提供了多種 Nsight 工具，其中 Nsight Systems用於系統級的性能分析，而 Nsight Compute 用於內核級的性能細節分析。以下是安裝步驟：

確認、安裝 NVIDIA 驅動版本
安裝 Nsight Systems: 前往 NVIDIA Nsight 官網下載適合你操作系統的版本，並按照指引安裝。

第二步：基於 PyTorch 的 VRAM 使用分析

通常可以使用PyTorch 提供的 API 查看 VRAM 的使用情況：

import torch
assert torch.cuda.is_available()
print(f"Allocated Memory: {torch.cuda.memory_allocated()} bytes")
print(f"Cached Memory: {torch.cuda.memory_reserved()} bytes")

然而，筆者不少次遇到此方法無法正確反映(通常是報少了)實際用量的狀況，通常發生於較複雜的腳本，因此，我們需要搭配Nsight更深入準確的研究

import torch
from torch.cuda import nvtx
model = torch.nn.Linear(100, 10).cuda()
data = torch.randn(1000, 100).cuda()
nvtx.range_push("Forward Pass")  # 設定標籤
output = model(data)
nvtx.range_pop()   # 結束標籤
nvtx.range_push("Backward Pass") # 設定第二個標籤
output.sum().backward()
nvtx.range_pop()  # 結束標籤

執行此代碼時，Nsight 工具會捕捉到 `Forward Pass` 和 `Backward Pass` 的 NVTX 標記，幫助分析哪部分代碼消耗了更多的 VRAM 或計算資源。

接下來，執行

nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu --capture-range=cudaProfilerApi --cuda-memory-usage=true --cudabacktrace=true --capture-range-end=stop --force-overwrite=true -x true -o profiling python ...

程式跑完後，會產生profiling.nsys-rep這個檔案，此時執行

nsys-ui profiling.nsys-rep即可開啟UI觀察

(如果是windows，可以安裝Nvidia Nsight System應用程式，並拖曳檔案開啟)

如果需要更好地分析python function與GPU的互動，可以加上

--python-functions-trace=<json_file> (參考<target-platform-folder>/PythonFunctionsTrace/annotations.json)

--python-sampling=true, --python-sampling-frequency等指令

通過結合 Nsight 系列工具與 PyTorch 的內置分析 API，我們可以全面了解 GPU 的 VRAM 使用狀況，幫助優化深度學習模型的性能。從基礎的 torch.cuda ，到 NVTX 的範圍標記，再到 Nsight 系統和內核級的深入分析，每一步都能幫助我們更專業地管理 GPU 資源。

留言

留言分享你的想法！

ECOE的沙龍

4會員

42內容數

ECOE的沙龍的其他內容

2025/02/03