2024-11-28|閱讀時間 ‧ 約 0 分鐘

AI說書 - 從0開始 - 257 | Attention Head 輸出機率檢視

我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。


現在我們想要檢視 Attention Head 係數,且以 Word x Word 的方式呈現,以下開始程式撰寫:

import pandas as pd 
import ipywidgets as widgets

df_layers_heads = []
for layer, attention in enumerate(attentions):
for head, head_attention in enumerate(attention[0]):
attention_matrix = head_attention[:len(tokens), :len(tokens)].detach().numpy()
df_attention = pd.DataFrame(attention_matrix, index = tokens, columns = tokens)
df_layers_heads.append((layer, head, df_attention))


pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)


# Function to display the attention matrix
def display_attention(selected_layer, selected_head):
_, _, df_to_display = next(df for df in df_layers_heads if df[0] == selected_layer and df[1] == selected_head)
display(df_to_display)


# Create interactive widgets for the layer and head
layer_widget = widgets.IntSlider(min = 0, max = len(attentions)-1, step = 1, description = 'Layer:')
head_widget = widgets.IntSlider(min = 0, max = len(attentions[0][0])-1, step = 1, description = 'Head:')


widgets.interact(display_attention, selected_layer = layer_widget, selected_head = head_widget)
分享至
成為作者繼續創作的動力吧!
© 2024 vocus All rights reserved.