我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
我們已經在AI說書 - 從0開始 - 19中,闡述了Inference的Pipeline為t = f(n),現在我們做一些擴充與特點說明:
t = f(n)其實展開引入時間概念就是ti = f(t1, t2, ... , ti-1),因此特點為:
接著引述書籍:Transformers for Natural Language Processing and Computer Vision, Denis Rothman, 2024,來闡述GPT模型的觀點:
Some may say that a GPT series model such as ChatGPT goes through unsupervised training. That statement is only true to a certain extent. Token by token, a GPT-like model finds its way to accuracy through self-supervised learning, predicting each subsequent token based on the preceding ones in the sequence. It succeeds in doing so through the influence of all the other tokens’ representations in a sequence.
We can also fine-tune a GPT model with an input (prompt) and output (completion) with labels! We can provide thousands of inputs (prompts) with one token as an output (completion). For example, we can create thousands of questions as inputs with only true and false as outputs. This is implicit supervised learning. Also, the model will not explicitly memorize the correct predictions. It will simply learn the patterns of the tokens.