[Python]使用tracemalloc 模組來比較兩種不同方法所佔用的記憶體大小

螃蟹_crab

發佈於[Python][OpenCV]學習心得筆記

更新於 2024/09/04發佈於 2024/09/04閱讀時間約 1 分鐘

透過 Python 的 tracemalloc模組來比較兩種方法在執行過程中佔用的記憶體大小。以下是兩者的記憶體佔用比較範例。

程式說明

tracemalloc.start()：開始追踪記憶體分配。
tracemalloc.get_traced_memory()：返回當前和峰值的記憶體使用量（以byte為單位）。
tracemalloc.stop()：停止追踪記憶體。

預期結果

儲存路徑後再讀取：因為所有影像路徑會被儲存到列表中，所以當前和峰值記憶體使用量會比較大。
使用 yield：因為路徑是逐步生成的，不會一次性佔用大塊記憶體，所以記憶體使用量應該會較低。

Python版本

3.11.3

1. 儲存路徑後再讀取的方式

import os
import tracemalloc

def get_image_paths(directory):
    image_paths = []
    for filename in os.listdir(directory):
        if filename.endswith(('.png', '.jpg', '.jpeg', '.bmp')):
            image_paths.append(os.path.join(directory, filename))
    return image_paths

directory_path = '/path/to/your/images'

# 開始追踪記憶體
tracemalloc.start()

# 執行儲存路徑後再讀取的方法
image_paths = get_image_paths(directory_path)

for _ in image_paths:
	pass

# 獲取記憶體使用情況
current, peak = tracemalloc.get_traced_memory()
print(f"儲存路徑後再讀取方法 - 當前記憶體使用: {current / 10**6:.2f} MB; 峰值記憶體使用: {peak / 10**6:.2f} MB")

# 停止追踪記憶體
tracemalloc.stop()

2. 使用 `yield` 生成器的方式

import os
import tracemalloc

def process_images_in_directory(directory):
    for filename in os.listdir(directory):
        if filename.endswith(('.png', '.jpg', '.jpeg', '.bmp')):
            img_path = os.path.join(directory, filename)
            yield img_path

directory_path = '/path/to/your/images'

# 開始追踪記憶體
tracemalloc.start()

# 執行使用 yield 的方法
for _ in process_images_in_directory(directory_path):
    pass

# 獲取記憶體使用情況
current, peak = tracemalloc.get_traced_memory()
print(f"使用 yield 方法 - 當前記憶體使用: {current :.6f} B; 峰值記憶體使用: {peak / 10**3:.2f} KB")

# 停止追踪記憶體
tracemalloc.stop()

結果是使用yield生成的方式，占用的記憶體比較高：

可能原因:

生成器狀態的管理：生成器在每次 yield 時會保存其執行狀態，這包括當前執行的位置、局部變量等。這些額外的開銷可能會比一次性儲存所有路徑的內存開銷略高，即使整體記憶體佔用量非常小。
記憶體分配和釋放：生成器函數的內存分配和釋放模式可能會導致記憶體使用的峰值略高。即使每次 yield 不會大量使用記憶體，但管理生成器狀態的開銷可能使得峰值記憶體使用稍高。

怎麼會跟課本上教的不一樣呢??

這次我們用讀取文件的方式來測試，讀取含有2864433的文件檔(一推無意義的數字)

import tracemalloc
def read_entire_file(filename):
    with open(filename, 'r') as file:
        content = file.read()
    return content

def read_file_line_by_line(filename):
    with open(filename, 'r') as file:
        for line in file:
            yield line
            
# 執行一次性讀取整個文件的方法
tracemalloc.start()
file_content = read_entire_file('D:\sss.txt')
for _ in file_content:
    pass
current, peak = tracemalloc.get_traced_memory()
print(f"一次性讀取整個文件 - 當前記憶體使用: {current:.6f} B; 峰值記憶體使用: {peak / 10**3:.2f} KB")
tracemalloc.stop()

# 執行yield的方法
tracemalloc.start()
for line in read_file_line_by_line('D:\sss.txt'):
    pass
current, peak = tracemalloc.get_traced_memory()
print(f"逐行讀取文件使用 yield - 當前記憶體使用: {current:.6f} B; 峰值記憶體使用: {peak / 10**3:.2f} KB")
tracemalloc.stop()