更新於 2024/06/16閱讀時間約 3 分鐘

AI說書 - 從0開始 - 20

我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。


我們已經在AI說書 - 從0開始 - 19中,闡述了Inference的Pipeline為t = f(n),現在我們做一些擴充與特點說明:


t = f(n)其實展開引入時間概念就是ti = f(t1, t2, ... , ti-1),因此特點為:

  • Transformer Model的Dynamic特性展現在:由Incremental Input序列{ t1, t2, ... , ti-1 }來產生來產生Output ti
  • 該模型將適應全新的輸入並產生輸出
  • Transformer Model的Implicit特性展現在:The model encodes and stores relationships between tokens in weights and biases. It just keeps producing tokens based on its dynamic inputs based on millions of text, image, and audio data.
  • Transformer Model的Flexibility特性展現在:適應各種不同輸入,GPT模型均能產生輸出


接著引述書籍:Transformers for Natural Language Processing and Computer Vision, Denis Rothman, 2024,來闡述GPT模型的觀點:

  • Supervised and Unsupervised

Some may say that a GPT series model such as ChatGPT goes through unsupervised training. That statement is only true to a certain extent. Token by token, a GPT-like model finds its way to accuracy through self-supervised learning, predicting each subsequent token based on the preceding ones in the sequence. It succeeds in doing so through the influence of all the other tokens’ representations in a sequence.

We can also fine-tune a GPT model with an input (prompt) and output (completion) with labels! We can provide thousands of inputs (prompts) with one token as an output (completion). For example, we can create thousands of questions as inputs with only true and false as outputs. This is implicit supervised learning. Also, the model will not explicitly memorize the correct predictions. It will simply learn the patterns of the tokens.

分享至
成為作者繼續創作的動力吧!
© 2024 vocus All rights reserved.