我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。從 AI說書 - 從0開始 - 193 | 第七章引言 到 AI說書 - 從0開始 - 222 | GPT 4 & RAG 測試,我們完成書籍:Transformers for Natural Language Processing and Computer Vision, Denis Rothman, 2024 第七章說明。以下附上參考項目:OpenAI and GPT-3 engines: https://beta.openai.com/docs/engines/enginesBertViz GitHub Repository by Jesse Vig: https://github.com/jessevig/bertvizOpenAI’s supercomputer: https://blogs.microsoft.com/ai/openai-azure-supercomputer/Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018, Improving Language Un-derstanding by Generative Pre-Training: https://cdn.openai.com/research-covers/language- unsupervised/language_understanding_paper.pdfAlec Radford et al., 2019, Language Models are Unsupervised Multi-task Learners: https://cdn. openai.com/better-language-models/language_models_are_unsupervised_multitask_ learners.pdfCommon Crawl data: https://commoncrawl.org/overview/GPT-4 Technical Report, OpenAI 2023: https://arxiv.org/pdf/2303.08774.pdf以下附上額外閱讀項目:Alex Wang et.al, 2019, GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding: https://arxiv.org/pdf/1804.07461.pdfAlex Wang et al., 20192, SuperGLUE: A Stickier Benchmark for General-Purpose Language Under- standing Systems: https://w4ngatang.github.io/static/papers/superglue.pdfTom B. Brown et al., 2020, Language Models are Few-Shot Learners: https://arxiv.org/ abs/2005.14165Chi Wang et al., 2023, Cost-Effective Hyperparameter Optimization for Large Language Model Gen- eration Inference: https://arxiv.org/abs/2303.04673Vaswani et al., 2017, Attention Is All You Need: https://arxiv.org/abs/1706.03762