我想要一天分享一點「LLM從底層堆疊的技術」,並且每篇文章長度控制在三分鐘以內,讓大家不會壓力太大,但是又能夠每天成長一點。
回顧一下目前手上有的素材:
現在載入 AI說書 - 從0開始 - 149 準備好的 Data Loader,並執行預測:
model.eval()
raw_predictions, predicted_classes, true_labels = [], [], []
def softmax(logits):
e = np.exp(logits)
return e / np.sum(e)
for batch in prediction_dataloader:
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_mask, b_labels = batch
with torch.no_grad():
outputs = model(b_input_ids, token_type_ids = None, attention_mask = b_input_mask)
logits = outputs['logits'].detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
b_input_ids = b_input_ids.to('cpu').numpy()
batch_sentences = [tokenizer.decode(input_ids, skip_special_tokens = True) for input_ids in b_input_ids]
probabilities = torch.nn.functional.softmax(torch.tensor(logits), dim = -1)
batch_predictions = np.argmax(probabilities, axis = 1)
for i, sentence in enumerate(batch_sentences):
print(f"Sentence: {sentence}")
print(f"Prediction: {logits[i]}")
print(f"Sofmax probabilities", softmax(logits[i]))
print(f"Prediction: {batch_predictions[i]}")
print(f"True label: {label_ids[i]}")
raw_predictions.append(logits)
predicted_classes.append(batch_predictions)
true_labels.append(label_ids)
最後來看看結果,首先看句子本身長什麼樣子:
接著是預測結果,用 Logits 來表示:
接著經過轉換,變成機率形式:
兩機率相比較,得到預測結果,BERT 模型說句子文法是對的:
句子的文法真實結果確實是對的: