2024-08-24|閱讀時間 ‧ 約 23 分鐘

AI說書 - 從0開始 - 147 | BERT 微調之訓練程式撰寫

我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。


回顧一下目前手上有的素材:


以下開始撰寫訓練程式:

epochs = 4
total_steps = len(train_dataloader) * epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = total_steps)

train_loss_set = []

for _ in trange(epochs, desc = "Epoch"):
# Set our model to training mode (as opposed to evaluation mode)
model.train()

tr_loss = 0
nb_tr_examples = 0
nb_tr_steps = 0

for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_mask, b_labels = batch
optimizer.zero_grad()

outputs = model(b_input_ids, token_type_ids = None, attention_mask = b_input_mask, labels = b_labels)
loss = outputs['loss']
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
scheduler.step()

tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
print("Train loss: {}".format(tr_loss/nb_tr_steps))

model.eval()
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0

for batch in validation_dataloader:
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_mask, b_labels = batch
with torch.no_grad():
logits = model(b_input_ids, token_type_ids = None, attention_mask = b_input_mask)
logits = logits['logits'].detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
tmp_eval_accuracy = flat_accuracy(logits, label_ids)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))
分享至
成為作者繼續創作的動力吧!
© 2024 vocus All rights reserved.