重啟撲克機器人之路 -11：有時AI也是挺難搞的

更新於 2025/02/10發佈於 2025/02/10閱讀時間約 32 分鐘

原本以為處理資料清理和特徵編碼會是個簡單的任務，實際做起來卻發現即使是與語言模型合作，也處處藏著意想不到的挑戰。

一開始在處理撲克牌的編碼時還算順利，將rank和suit轉換成數值讓機器學習模型可以訓練。但當我請語言模型幫我設計訓練模型時，卻發現它漏掉了一些我認為相當重要的特徵 - 比如每位玩家的stack size、完整的牌面資訊（它只單純使用了牌的數量而非具體的rank和suit），最重要的是它完全忽略了previous actions這個關鍵特徵，既使我不斷地重複要求將其放入其中。最後將每個步驟拆解到相當小，一步一步要求才完成。

這讓我想起之前在處理語言模型時的一個重要領悟：與其深入鑽研他提供的每一段程式碼細節，不如先確保程式能夠運作，即使可能還不是最理想的狀態。這種方式和我過去的開發習慣有很大的不同。以前我總是試圖完全理解每個function、每個action的邏輯，但這在與語言模型協作時反而成為了一種障礙 - 畢竟它的寫作邏輯和風格往往與我們習慣的不同，花太多時間深入理解反而可能是在浪費精力，特別是當那個方向最後證明是個死胡同的時候。

最初的測試結果：

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== seq_types_input (InputLayer) [(None, 20)] 0 __________________________________________________________________________________________________ seq_amounts_input (InputLayer) [(None, 20, 1)] 0 __________________________________________________________________________________________________ static_input (InputLayer) [(None, 384)] 0 __________________________________________________________________________________________________ action_type_embedding (Embeddin (None, 20, 16) 416 seq_types_input[0][0] __________________________________________________________________________________________________

...

Epoch 20/20 399180/399180 [==============================] - 136s 342us/sample
 - loss: 0.6324 - acc: 0.6971 - val_loss: 0.7220 - val_acc: 0.6883 99795/99795 [==============================] - 8s 76us/sample - loss: 0.7220 - acc: 0.6883 Validation Loss: 0.7220 | Validation Accuracy: 0.6883

雖然單一行動的預測並不是我的主要目標（我更在意模型能否準確預測在特定情況下對手各種可能行動的機率分布），但這個結果至少證明了這個方向是可行的。即使我對機器學習還不夠熟悉，可能還看不懂很多統計指標的細節，但這是個好的開始。

這次的經驗再次證明，在使用語言模型輔助開發時，最重要的是先得到一個可以運作的版本，然後再逐步改進。這種方式不僅能讓開發更有效率，也能避免在可能是死路的方向上投入過多時間。

我模型訓練的程式碼：

#!/usr/bin/env python3

import json

import numpy as np

from sklearn.model_selection import train_test_split

import tensorflow as tf

from tensorflow.keras.models import Model

from tensorflow.keras.layers import (

Input, Dense, LSTM, Embedding, Concatenate, Dropout

)

from tensorflow.keras.optimizers import Adam



# =============================================================================

# 1. Helper Functions for Feature Encoding

# =============================================================================



def one_hot_round(round_no):

"""One-hot encode round number (1=preflop, 2=flop, 3=turn, 4=river)."""

vec = np.zeros(4)

if 1 <= round_no <= 4:

vec[round_no - 1] = 1

return vec



def one_hot_position(pos, max_players=10):

"""One-hot encode a player's position (an integer in [0, max_players-1])."""

vec = np.zeros(max_players)

if pos < max_players:

vec[pos] = 1

return vec



def card_to_onehot(card):

"""

Convert a card string (e.g., 'S4', 'HA', 'C10') to a 52-dim one-hot vector.

If the card is hidden (e.g., starts with 'X') it returns an all-zeros vector.

"""

ranks = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']

suits = ['S','H','D','C']

onehot = np.zeros(52)

if card is None or card.upper().startswith("X"):

return onehot

suit = card[0]

rank = card[1:]

if suit in suits and rank in ranks:

suit_index = suits.index(suit)

rank_index = ranks.index(rank)

index = suit_index * 13 + rank_index

onehot[index] = 1

return onehot



def encode_board_cards(board_cards, max_cards=5):

"""

Encode the board cards as the concatenation of one-hot vectors (52 dims each).

Pads with zeros if there are fewer than max_cards.

"""

encoded = []

for card in board_cards:

encoded.append(card_to_onehot(card))

while len(encoded) < max_cards:

encoded.append(np.zeros(52))

return np.concatenate(encoded[:max_cards])



def encode_hole_cards(hole_cards):

"""

Encode the player's (or actor's) hole cards (expected to be a list of 2 cards)

as a concatenation of two 52-dim one-hot vectors.

"""

encoded = []

for card in hole_cards:

encoded.append(card_to_onehot(card))

while len(encoded) < 2:

encoded.append(np.zeros(52))

return np.concatenate(encoded[:2])



# =============================================================================

# 2. Process Each Snapshot into Model Inputs and Target

# =============================================================================



def process_snapshot(snapshot):

"""

From a snapshot dictionary, create:

- A vector of static features

- A sequence of previous actions (each with an action type and amount)

- The target (opponent's current action type)

**Static features include:**

- One-hot encoded round (4 dims)

- Pot size (1 dim; scaled)

- Blinds: small, big, ante (3 dims; scaled)

- Actor stack (1 dim; scaled)

- Actor position (one-hot, 10 dims)

- Number of players remaining (1 dim; scaled)

- Board cards (5 fixed cards × 52 dims = 260 dims)

- Actor hole cards (2 cards × 52 dims = 104 dims)

**Sequential features:**

For each previous action (up to a fixed max length) we use:

- action_type (integer; offset by +1 so that 0 is reserved for padding)

- action_sum (float; scaled)

"""

# --- Static features ---

# 1. Round (from current action)

round_no = int(snapshot["action"]["round"])

round_vec = one_hot_round(round_no)

# 2. Pot size (scale by 100)

pot_size = np.array([float(snapshot["pot_size"]) / 100.0])

# 3. Blinds and ante (scaled)

blinds = snapshot.get("blinds", {})

small_blind = float(blinds.get("small_blind", 0)) / 100.0

big_blind = float(blinds.get("big_blind", 0)) / 100.0

ante = float(blinds.get("ante", 0)) / 100.0

blinds_vec = np.array([small_blind, big_blind, ante])

# 4. Actor stack size (scale by 1000)

actor_stack = np.array([float(snapshot.get("actor_stack_size", 0)) / 1000.0])

# 5. Actor position (one-hot with dimension 10)

actor_pos = int(snapshot.get("actor_position", 0))

pos_vec = one_hot_position(actor_pos, max_players=10)

# 6. Number of players remaining (scale by 10)

players_remaining = np.array([float(snapshot.get("players_remaining", 0)) / 10.0])

# 7. Board cards (5 fixed cards)

board_vec = encode_board_cards(snapshot.get("board_cards", []), max_cards=5)

# 8. Actor hole cards (2 cards)

hole_cards_vec = encode_hole_cards(snapshot.get("actor_hole_cards", []))

# Concatenate all static features:

# Total dims: 4 + 1 + 3 + 1 + 10 + 1 + 260 + 104 = 384

static_features = np.concatenate([

round_vec, pot_size, blinds_vec, actor_stack, pos_vec,

players_remaining, board_vec, hole_cards_vec

])

# --- Sequential features ---

# For each previous action, we take:

# - action_type (offset by +1 so that 0 is our pad value)

# - action_sum (scaled by 100)

seq_actions = snapshot.get("previous_actions", [])

seq_types = []

seq_amounts = []

for action in seq_actions:

act_type = int(action.get("action_type", 0)) + 1 # reserve 0 for padding

act_sum = float(action.get("action_sum", 0)) / 100.0

seq_types.append(act_type)

seq_amounts.append(act_sum)

MAX_SEQ_LENGTH = 20 # maximum number of previous actions to consider

# Truncate if too long

seq_types = seq_types[:MAX_SEQ_LENGTH]

seq_amounts = seq_amounts[:MAX_SEQ_LENGTH]

# Pad sequences (pad type=0, which for action_type will be masked in the Embedding layer)

while len(seq_types) < MAX_SEQ_LENGTH:

seq_types.append(0)

seq_amounts.append(0.0)

seq_types = np.array(seq_types, dtype=np.int32)

seq_amounts = np.array(seq_amounts, dtype=np.float32).reshape((MAX_SEQ_LENGTH, 1))

# --- Target: Opponent's current action type (as integer) ---

target = int(snapshot["action"].get("action_type", 0))

return static_features, seq_types, seq_amounts, target



# =============================================================================

# 3. Load and Preprocess Data

# =============================================================================



def load_and_preprocess_data(json_filename):

"""

Load snapshot logs from a JSON file and create training arrays.

The JSON file is expected to be a list of snapshots.

"""

with open(json_filename, 'r') as f:

data = json.load(f)

static_features_list = []

seq_types_list = []

seq_amounts_list = []

targets = []

for snapshot in data:

static_feat, seq_types, seq_amounts, target = process_snapshot(snapshot)

static_features_list.append(static_feat)

seq_types_list.append(seq_types)

seq_amounts_list.append(seq_amounts)

targets.append(target)

X_static = np.stack(static_features_list) # shape: (N, 384)

X_seq_types = np.stack(seq_types_list) # shape: (N, MAX_SEQ_LENGTH)

X_seq_amounts = np.stack(seq_amounts_list) # shape: (N, MAX_SEQ_LENGTH, 1)

y = np.array(targets, dtype=np.int32) # shape: (N,)

return X_static, X_seq_types, X_seq_amounts, y



# Change the filename below to your JSON file produced by your XML parser.

JSON_FILENAME = 'logs.json'

X_static, X_seq_types, X_seq_amounts, y = load_and_preprocess_data(JSON_FILENAME)



# (Optional) Check the shapes of your training arrays:

print("X_static shape:", X_static.shape)

print("X_seq_types shape:", X_seq_types.shape)

print("X_seq_amounts shape:", X_seq_amounts.shape)

print("y shape:", y.shape)



# Split into training and validation sets

X_static_train, X_static_val, X_seq_types_train, X_seq_types_val, X_seq_amounts_train, X_seq_amounts_val, y_train, y_val = train_test_split(

X_static, X_seq_types, X_seq_amounts, y, test_size=0.2, random_state=42

)



# =============================================================================

# 4. Build the Keras Model

# =============================================================================



# Parameters for the sequential branch

MAX_SEQ_LENGTH = 20

# Adjust NUM_ACTION_TYPES based on your data (here we assume 25; update if needed)

NUM_ACTION_TYPES = 25

EMBEDDING_DIM = 16

NUM_ACTION_CLASSES = 30 # number of distinct action types to predict (update as needed)



# -- Static Input Branch --

static_input = Input(shape=(384,), name='static_input')

x_static = Dense(128, activation='relu')(static_input)

x_static = Dense(64, activation='relu')(x_static)



# -- Sequential Input Branch --

# Input for action types (integers; shape = (MAX_SEQ_LENGTH,))

seq_types_input = Input(shape=(MAX_SEQ_LENGTH,), dtype='int32', name='seq_types_input')

# Input for action amounts (floats; shape = (MAX_SEQ_LENGTH, 1))

seq_amounts_input = Input(shape=(MAX_SEQ_LENGTH, 1), dtype='float32', name='seq_amounts_input')



# Process action types with an Embedding layer.

# (We use mask_zero=True so that padded 0 values are ignored.)

x_seq_types = Embedding(

input_dim=NUM_ACTION_TYPES + 1, # +1 to reserve index 0 for padding

output_dim=EMBEDDING_DIM,

mask_zero=True,

name='action_type_embedding'

)(seq_types_input)



# Process the amounts with a simple dense layer (applied to each time step).

x_seq_amounts = Dense(8, activation='relu', name='amount_dense')(seq_amounts_input)



# Concatenate along the feature dimension: now each time step has (EMBEDDING_DIM + 8) features.

x_seq = Concatenate(name='seq_concat')([x_seq_types, x_seq_amounts])



# Process the concatenated sequence with an LSTM.

x_seq = LSTM(64, name='lstm_seq')(x_seq)



# -- Merge Both Branches --

x = Concatenate(name='merge')([x_static, x_seq])

x = Dense(64, activation='relu')(x)

x = Dropout(0.5)(x)

output = Dense(NUM_ACTION_CLASSES, activation='softmax', name='output')(x)



model = Model(

inputs=[static_input, seq_types_input, seq_amounts_input],

outputs=output

)



model.compile(

optimizer=Adam(learning_rate=1e-3),

loss='sparse_categorical_crossentropy',

metrics=['accuracy']

)



model.summary()



# =============================================================================

# 5. Train the Model

# =============================================================================



history = model.fit(

x={

'static_input': X_static_train,

'seq_types_input': X_seq_types_train,

'seq_amounts_input': X_seq_amounts_train

},

y=y_train,

validation_data=(

{

'static_input': X_static_val,

'seq_types_input': X_seq_types_val,

'seq_amounts_input': X_seq_amounts_val

},

y_val

),

epochs=20,

batch_size=32

)



# =============================================================================

# 6. Evaluate / Save the Model

# =============================================================================



loss, acc = model.evaluate(

x={

'static_input': X_static_val,

'seq_types_input': X_seq_types_val,

'seq_amounts_input': X_seq_amounts_val

},

y=y_val

)

print(f"Validation Loss: {loss:.4f} | Validation Accuracy: {acc:.4f}")



# Optionally, save your model:

model.save("opponent_model.h5")