我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
以下附上參考項目:
- OpenAI and GPT-3 engines: https://beta.openai.com/docs/engines/engines
- BertViz GitHub Repository by Jesse Vig: https://github.com/jessevig/bertviz
- OpenAI’s supercomputer: https://blogs.microsoft.com/ai/openai-azure-supercomputer/
- Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018, Improving Language Un-derstanding by Generative Pre-Training: https://cdn.openai.com/research-covers/language- unsupervised/language_understanding_paper.pdf
- Alec Radford et al., 2019, Language Models are Unsupervised Multi-task Learners: https://cdn. openai.com/better-language-models/language_models_are_unsupervised_multitask_ learners.pdf
- Common Crawl data: https://commoncrawl.org/overview/
- GPT-4 Technical Report, OpenAI 2023: https://arxiv.org/pdf/2303.08774.pdf
以下附上額外閱讀項目:
- Alex Wang et.al, 2019, GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding: https://arxiv.org/pdf/1804.07461.pdf
- Alex Wang et al., 20192, SuperGLUE: A Stickier Benchmark for General-Purpose Language Under- standing Systems: https://w4ngatang.github.io/static/papers/superglue.pdf
- Tom B. Brown et al., 2020, Language Models are Few-Shot Learners: https://arxiv.org/ abs/2005.14165
- Chi Wang et al., 2023, Cost-Effective Hyperparameter Optimization for Large Language Model Gen- eration Inference: https://arxiv.org/abs/2303.04673
- Vaswani et al., 2017, Attention Is All You Need: https://arxiv.org/abs/1706.03762