我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
從 AI說書 - 從0開始 - 125 到 AI說書 - 從0開始 - 155 | 文法判斷介面成果展示,我們完成書籍:Transformers for Natural Language Processing and Computer Vision, Denis Rothman, 2024 第五章說明。
以下附上參考項目:
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, 2017, Attention Is All You Need: https://arxiv.org/abs/1706.03762
- Alex Warstadt, Amanpreet Singh, Samuel R. Bowman, 2018, Neural Network Acceptability Judgments: https://arxiv.org/abs/1805.12471
- The Corpus of Linguistic Acceptability (CoLA): https://nyu-mll.github.io/CoLA/
- Documentation on Hugging Face models:
https://huggingface.co/transformers/pretrained_models.html
https://huggingface.co/transformers/model_doc/bert.html
https://huggingface.co/transformers/model_doc/roberta.html
https://huggingface.co/transformers/model_doc/distilbert.html
以下附上額外閱讀項目:
- Vladislav Mosin, Igor Samenko, Alexey Tikhonov, Borislav Kozlovskii, and Ivan P. Yamshchikov, 2021, Fine-Tuning Transformers: Vocabulary Transfer: https://arxiv.org/abs/2112.14569
- Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, and Donald Metzler, 2022, Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers: https://arxiv.org/abs/2109.10686