我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
回顧一下目前手上有的素材:
以下開始撰寫訓練程式:
epochs = 4
total_steps = len(train_dataloader) * epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = total_steps)
train_loss_set = []
for _ in trange(epochs, desc = "Epoch"):
# Set our model to training mode (as opposed to evaluation mode)
model.train()
tr_loss = 0
nb_tr_examples = 0
nb_tr_steps = 0
for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_mask, b_labels = batch
optimizer.zero_grad()
outputs = model(b_input_ids, token_type_ids = None, attention_mask = b_input_mask, labels = b_labels)
loss = outputs['loss']
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
scheduler.step()
tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
print("Train loss: {}".format(tr_loss/nb_tr_steps))
model.eval()
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0
for batch in validation_dataloader:
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_mask, b_labels = batch
with torch.no_grad():
logits = model(b_input_ids, token_type_ids = None, attention_mask = b_input_mask)
logits = logits['logits'].detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
tmp_eval_accuracy = flat_accuracy(logits, label_ids)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))