我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device)
下載圖像:
cifar100 = CIFAR100(root = os.path.expanduser("~/.cache"), download = True, train = False)
準備輸入資料,包含圖檔與文字:
image, class_id = cifar100[index]
image_input = preprocess(image).unsqueeze(0).to(device)
text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)
檢視一下圖片的樣子:
import matplotlib.pyplot as plt
from torchvision import transforms
plt.imshow(image)
結果為:
