我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
- AI說書 - 從0開始 - 180 | RoBERTa 預訓練前言:RoBERTa 預訓練前言
- AI說書 - 從0開始 - 181 | 預訓練模型資料下載與相關依賴準備:預訓練模型資料下載與相關依賴準備
- AI說書 - 從0開始 - 182 | 資料清洗:資料清洗
- AI說書 - 從0開始 - 183 | 初始化模型與 Tokenizer:初始化模型與 Tokenizer
- AI說書 - 從0開始 - 184 | 訓練 & 驗證資料集切割:訓練 & 驗證資料集切割
在訓練模型之前,尚需要進行 Trainer 初始化:
from transformers import Trainer
from datetime import datetime
from typing import Dict, Any
class CustomTrainer(Trainer):
def log(self, logs: Dict[str, Any]) -> None:
super().log(logs)
if "step" in logs: # Check if "step" key is in the logs dictionary
step = int(logs["step"])
if step % self.args.eval_steps == 0:
print(f"Current time at step {step}: {datetime.now()}")
import logging
from transformers import Trainer, TrainingArguments
logging.basicConfig(level = logging.INFO)
training_args = TrainingArguments(output_dir = "/content/model/model/",
overwrite_output_dir = True,
num_train_epochs = 2,
per_device_train_batch_size = 128,
per_device_eval_batch_size = 128,
warmup_steps = 500, # for learning rate scheduler
weight_decay = 0.01, # adding a small penalty for larger weights to the loss to prevent overfitting
save_steps = 10000,
save_total_limit = 2, # the maximum number of checkpoint model files to keep
logging_dir = '/content/model/logs/',
logging_steps = 10,
logging_first_step = True,
evaluation_strategy = "steps",
eval_steps = 500,
fp16 = True)
trainer = CustomTrainer(model = model,
args = training_args,
data_collator = data_collator,
train_dataset = tokenized_datasets["train"],
eval_dataset = tokenized_datasets["test"])
較為精簡版的 Training Argument 見 AI說書 - 從0開始 - 176 | 初始化 Trainer
直接使用,而非目前這種客製化的 Trainer 見 AI說書 - 從0開始 - 176 | 初始化 Trainer