我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
from transformers import TimesformerConfig, TimesformerModel
configuration = TimesformerConfig()
model = TimesformerModel(configuration)
configuration = model.config
import av
import torch
import numpy as np
from transformers import AutoImageProcessor, TimesformerForVideoClassification
from huggingface_hub import hf_hub_download
np.random.seed(0)
我們現在定義一個使用 PyAv 的函數,將視頻解碼並將每一幀存儲到一個初始為空列表的幀列表中。隨著視頻的解碼,幀列表會逐幀新增幀數據:
def read_video_pyav(container, indices):
frames = []
container.seek(0)
start_index = indices[0]
end_index = indices[-1]
for i, frame in enumerate(container.decode(video = 0)):
if i > end_index:
break
if i >= start_index and i in indices:
frames.append(frame)
return np.stack([x.to_ndarray(format = "rgb24") for x in frames])

















