2024-07-10|閱讀時間 ‧ 約 24 分鐘

GM 004｜你知道Transformers能成為統計人員嗎？

今天聊一聊由 Yu Bai [1] 於2023年發表的文章，

《Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection》[2]。

本文章的標題很有意思，直接翻譯是「作為統計人員的Transformers」。

為什麼這樣說呢？

因為這篇文章探討語言模型的「脈絡內學習 In-context Learning」現象，

並且討論語言模型是否能勝任「統計人員 Statistician」的任務。

那麼統計人員的任務是什麼呢？

在這篇文章中，就是做「嶺回歸 Ridge Regression」[3]，

做「LASSO Regression」[4]的能力。

而這篇文章還是一篇理論文章，

其使用的理論工具是所謂的「脈絡內梯度下降 In-Context Gradient Descent」。

能做梯度下降，代表可以寫「損失函數 Loss Function」[5]，

而這個損失函數是基於Transformers[6]的性質寫出來的，

在文章的附錄D，非常值得統計學背景的研究生學習。

在這個Transformer的世界裡，

輸入的數據不再是傳統的「向量形式」的輸入數據，

而是從文本出發，進一步轉換為「符元 Token」[7]，

然後轉為「向量形式」的表達，

再接入目前的深度學習架構。

Reference

[1] https://yubai.org/

[2] https://arxiv.org/abs/2306.04637

[3] https://en.wikipedia.org/wiki/Ridge_regression

[4] https://en.wikipedia.org/wiki/Lasso_(statistics)

[5] https://en.wikipedia.org/wiki/Loss_function

[6] https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)

[7] https://www.threads.net/@chihua.wang.3/post/C8LKZ6tyopB

分享至

成為作者繼續創作的動力吧！

王啟樺的沙龍的其他內容