我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
patches_reshaped = patches.permute(0, 2, 3, 1, 4, 5).contiguous().view(-1, 3, patch_size, patch_size)
to_pil = transforms.ToPILImage() # create a transform to convert tensor to PIL Image
for i in range(patches_reshaped.size(0)):
print(f"Displaying patch {i+1}/{patches_reshaped.size(0)}")
patch_size = patches_reshaped[i].shape
plt.title(f"Patch {i+1}, size: {patch_size}")
plt.imshow(to_pil(patches_reshaped[i]))
plt.axis("off")
plt.show()
當中程式碼解說為:
- patches 形狀為 (batch_size, channels, num_patches_height, num_patches_width, patch_size, patch_size)
- permute(0, 2, 3, 1, 4, 5) 的操作順序將維度重新排列為 (batch_size, num_patches_height, num_patches_width, channels, patch_size, patch_size)
- .contiguous() 函數確保 Tensor 在記憶體中的排列是連續的,這在後續進行 .view() 操作時尤為重要,因為 PyTorch 需要 Tensor 的數據是連續的才能進行有效的重塑操作
- .view(-1, 3, patch_size, patch_size) 將 Tensor 變成一個四維張量,形狀為 (batch_size * num_patches_height * num_patches_width, channels, patch_size, patch_size)
結果將有 196 張 16 x 16 的圖片,舉例一張為:
