我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
from transformers import AutoTokenizer
import transformers
import torch
model = "meta-llaMA/LlaMA-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline("text-generation",
model = model,
torch_dtype = torch.float16,
device_map = "auto")
接著定義一個推論函數:
def LlaMA2(prompt):
sequences = pipeline(prompt,
do_sample = True,
top_k = 10,
num_return_sequences = 1,
eos_token_id = tokenizer.eos_token_id,
max_length = 200)
return sequences



























