我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
整理目前手上有的素材:
在訓練模型之前,尚需要進行 Trainer 初始化:
from transformers import Trainer
from datetime import datetime
from typing import Dict, Any
class CustomTrainer(Trainer):
def log(self, logs: Dict[str, Any]) -> None:
super().log(logs)
if "step" in logs: # Check if "step" key is in the logs dictionary
step = int(logs["step"])
if step % self.args.eval_steps == 0:
print(f"Current time at step {step}: {datetime.now()}")
import logging
from transformers import Trainer, TrainingArguments
logging.basicConfig(level = logging.INFO)
training_args = TrainingArguments(output_dir = "/content/model/model/",
overwrite_output_dir = True,
num_train_epochs = 2,
per_device_train_batch_size = 128,
per_device_eval_batch_size = 128,
warmup_steps = 500, # for learning rate scheduler
weight_decay = 0.01, # adding a small penalty for larger weights to the loss to prevent overfitting
save_steps = 10000,
save_total_limit = 2, # the maximum number of checkpoint model files to keep
logging_dir = '/content/model/logs/',
logging_steps = 10,
logging_first_step = True,
evaluation_strategy = "steps",
eval_steps = 500,
fp16 = True)
trainer = CustomTrainer(model = model,
args = training_args,
data_collator = data_collator,
train_dataset = tokenized_datasets["train"],
eval_dataset = tokenized_datasets["test"])
較為精簡版的 Training Argument 見 AI說書 - 從0開始 - 176 | 初始化 Trainer
直接使用,而非目前這種客製化的 Trainer 見 AI說書 - 從0開始 - 176 | 初始化 Trainer