我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
現在我們想要檢視 Attention Head 係數,且以 Word x Word 的方式呈現,以下開始程式撰寫:
import pandas as pd
import ipywidgets as widgets
df_layers_heads = []
for layer, attention in enumerate(attentions):
for head, head_attention in enumerate(attention[0]):
attention_matrix = head_attention[:len(tokens), :len(tokens)].detach().numpy()
df_attention = pd.DataFrame(attention_matrix, index = tokens, columns = tokens)
df_layers_heads.append((layer, head, df_attention))
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)
# Function to display the attention matrix
def display_attention(selected_layer, selected_head):
_, _, df_to_display = next(df for df in df_layers_heads if df[0] == selected_layer and df[1] == selected_head)
display(df_to_display)
# Create interactive widgets for the layer and head
layer_widget = widgets.IntSlider(min = 0, max = len(attentions)-1, step = 1, description = 'Layer:')
head_widget = widgets.IntSlider(min = 0, max = len(attentions[0][0])-1, step = 1, description = 'Head:')
widgets.interact(display_attention, selected_layer = layer_widget, selected_head = head_widget)