我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
整理目前手上有的素材:
為了進行資料視覺化,我們執行 t-SNE 降維作業,目標由 1536 至 2:
from sklearn.manifold import TSNE
import matplotlib
import matplotlib.pyplot as plt
tsne = TSNE(n_components = 2, perplexity = 15, random_state = 42, init = "random", learning_rate = 200)
vis_dims2 = tsne.fit_transform(matrix)
x = [x for x, y in vis_dims2]
y = [y for x, y in vis_dims2]
for category, color in enumerate(["purple", "green", "red", "blue"]):
xs = np.array(x)[df.Cluster == category]
ys = np.array(y)[df.Cluster == category]
plt.scatter(xs, ys, color = color, alpha = 0.3)
avg_x = xs.mean()
avg_y = ys.mean()
plt.scatter(avg_x, avg_y, marker = "x", color = color, s = 100)
plt.title("Clusters identified visualized in language 2d using t-SNE")
結果為: