我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
為了檢視 Attention Head 輸出機率,我們撰寫以下程式:
!pip install transformers
from transformers import BertTokenizer, BertModel
input_text = "The output shows the attention values" #@param {type:"string"}
from transformers import BertTokenizer, BertModel
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name, output_attentions = True)
tokens = tokenizer.tokenize(input_text)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
inputs = tokenizer.encode_plus(input_text, return_tensors = 'pt')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
outputs = model(input_ids, attention_mask = attention_mask)
attentions = outputs.attentions
當中註解如下: