我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
回顧目前手上有的素材:
如果 Embedding 後,想透過 TensorFlow Projector 進行視覺化,需要兩份檔案:
以下撰寫產出上述兩份檔案的程式:
import csv
import os
import numpy as np
LOG_DIR = '/content'
os.makedirs(LOG_DIR, exist_ok = True)
words = list(model.wv.key_to_index.keys())
vectors = [model.wv[word] for word in words]
with open(os.path.join(LOG_DIR, "vecs.tsv"), 'w', newline = '') as f:
writer = csv.writer(f, delimiter = '\t')
writer.writerows(vectors)
with open(os.path.join(LOG_DIR, "meta.tsv"), 'w', newline = '', encoding = 'utf-8') as f:
writer = csv.writer(f, delimiter = '\t')
writer.writerows([[word] for word in words])