我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
以下附上參考項目:
- The Hugging Face reference notebook: https://colab.research.google.com/github/ huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb
- The Hugging Face reference blog: https://colab.research.google.com/github/huggingface/ blog/blob/master/notebooks/01_how_to_train.ipynb
- More on BERT: https://huggingface.co/transformers/model_doc/bert.html
- More DistilBERT: https://arxiv.org/pdf/1910.01108.pdf
- More on RoBERTa: https://huggingface.co/transformers/model_doc/roberta.html
- Even more on DistilBERT: https://huggingface.co/transformers/model_doc/distilbert.html
以下附上額外閱讀項目:
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2018, Pretraining of Deep Bidirectional Transformers for Language Understanding: https://arxiv.org/abs/1810.04805
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov, RoBERTa: A Robustly Optimized BERT Pretraining Approach: https://arxiv.org/abs/1907.11692