重啟撲克機器人之路 -11:有時AI也是挺難搞的

更新於 發佈於 閱讀時間約 32 分鐘
raw-image

原本以為處理資料清理和特徵編碼會是個簡單的任務,實際做起來卻發現即使是與語言模型合作,也處處藏著意想不到的挑戰。

一開始在處理撲克牌的編碼時還算順利,將rank和suit轉換成數值讓機器學習模型可以訓練。但當我請語言模型幫我設計訓練模型時,卻發現它漏掉了一些我認為相當重要的特徵 - 比如每位玩家的stack size、完整的牌面資訊(它只單純使用了牌的數量而非具體的rank和suit),最重要的是它完全忽略了previous actions這個關鍵特徵,既使我不斷地重複要求將其放入其中。最後將每個步驟拆解到相當小,一步一步要求才完成。

這讓我想起之前在處理語言模型時的一個重要領悟:與其深入鑽研他提供的每一段程式碼細節,不如先確保程式能夠運作,即使可能還不是最理想的狀態。這種方式和我過去的開發習慣有很大的不同。以前我總是試圖完全理解每個function、每個action的邏輯,但這在與語言模型協作時反而成為了一種障礙 - 畢竟它的寫作邏輯和風格往往與我們習慣的不同,花太多時間深入理解反而可能是在浪費精力,特別是當那個方向最後證明是個死胡同的時候。

最初的測試結果:

__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== seq_types_input (InputLayer) [(None, 20)] 0 __________________________________________________________________________________________________ seq_amounts_input (InputLayer) [(None, 20, 1)] 0 __________________________________________________________________________________________________ static_input (InputLayer) [(None, 384)] 0 __________________________________________________________________________________________________ action_type_embedding (Embeddin (None, 20, 16) 416 seq_types_input[0][0] __________________________________________________________________________________________________

...

Epoch 20/20 399180/399180 [==============================] - 136s 342us/sample
- loss: 0.6324 - acc: 0.6971 - val_loss: 0.7220 - val_acc: 0.6883 99795/99795 [==============================] - 8s 76us/sample - loss: 0.7220 - acc: 0.6883 Validation Loss: 0.7220 | Validation Accuracy: 0.6883

雖然單一行動的預測並不是我的主要目標(我更在意模型能否準確預測在特定情況下對手各種可能行動的機率分布),但這個結果至少證明了這個方向是可行的。即使我對機器學習還不夠熟悉,可能還看不懂很多統計指標的細節,但這是個好的開始。

這次的經驗再次證明,在使用語言模型輔助開發時,最重要的是先得到一個可以運作的版本,然後再逐步改進。這種方式不僅能讓開發更有效率,也能避免在可能是死路的方向上投入過多時間。

我模型訓練的程式碼:

#!/usr/bin/env python3

import json

import numpy as np

from sklearn.model_selection import train_test_split

import tensorflow as tf

from tensorflow.keras.models import Model

from tensorflow.keras.layers import (

Input, Dense, LSTM, Embedding, Concatenate, Dropout

)

from tensorflow.keras.optimizers import Adam



# =============================================================================

# 1. Helper Functions for Feature Encoding

# =============================================================================



def one_hot_round(round_no):

"""One-hot encode round number (1=preflop, 2=flop, 3=turn, 4=river)."""

vec = np.zeros(4)

if 1 <= round_no <= 4:

vec[round_no - 1] = 1

return vec



def one_hot_position(pos, max_players=10):

"""One-hot encode a player's position (an integer in [0, max_players-1])."""

vec = np.zeros(max_players)

if pos < max_players:

vec[pos] = 1

return vec



def card_to_onehot(card):

"""

Convert a card string (e.g., 'S4', 'HA', 'C10') to a 52-dim one-hot vector.

If the card is hidden (e.g., starts with 'X') it returns an all-zeros vector.

"""

ranks = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']

suits = ['S','H','D','C']

onehot = np.zeros(52)

if card is None or card.upper().startswith("X"):

return onehot

suit = card[0]

rank = card[1:]

if suit in suits and rank in ranks:

suit_index = suits.index(suit)

rank_index = ranks.index(rank)

index = suit_index * 13 + rank_index

onehot[index] = 1

return onehot



def encode_board_cards(board_cards, max_cards=5):

"""

Encode the board cards as the concatenation of one-hot vectors (52 dims each).

Pads with zeros if there are fewer than max_cards.

"""

encoded = []

for card in board_cards:

encoded.append(card_to_onehot(card))

while len(encoded) < max_cards:

encoded.append(np.zeros(52))

return np.concatenate(encoded[:max_cards])



def encode_hole_cards(hole_cards):

"""

Encode the player's (or actor's) hole cards (expected to be a list of 2 cards)

as a concatenation of two 52-dim one-hot vectors.

"""

encoded = []

for card in hole_cards:

encoded.append(card_to_onehot(card))

while len(encoded) < 2:

encoded.append(np.zeros(52))

return np.concatenate(encoded[:2])



# =============================================================================

# 2. Process Each Snapshot into Model Inputs and Target

# =============================================================================



def process_snapshot(snapshot):

"""

From a snapshot dictionary, create:

- A vector of static features

- A sequence of previous actions (each with an action type and amount)

- The target (opponent's current action type)

**Static features include:**

- One-hot encoded round (4 dims)

- Pot size (1 dim; scaled)

- Blinds: small, big, ante (3 dims; scaled)

- Actor stack (1 dim; scaled)

- Actor position (one-hot, 10 dims)

- Number of players remaining (1 dim; scaled)

- Board cards (5 fixed cards × 52 dims = 260 dims)

- Actor hole cards (2 cards × 52 dims = 104 dims)

**Sequential features:**

For each previous action (up to a fixed max length) we use:

- action_type (integer; offset by +1 so that 0 is reserved for padding)

- action_sum (float; scaled)

"""

# --- Static features ---

# 1. Round (from current action)

round_no = int(snapshot["action"]["round"])

round_vec = one_hot_round(round_no)

# 2. Pot size (scale by 100)

pot_size = np.array([float(snapshot["pot_size"]) / 100.0])

# 3. Blinds and ante (scaled)

blinds = snapshot.get("blinds", {})

small_blind = float(blinds.get("small_blind", 0)) / 100.0

big_blind = float(blinds.get("big_blind", 0)) / 100.0

ante = float(blinds.get("ante", 0)) / 100.0

blinds_vec = np.array([small_blind, big_blind, ante])

# 4. Actor stack size (scale by 1000)

actor_stack = np.array([float(snapshot.get("actor_stack_size", 0)) / 1000.0])

# 5. Actor position (one-hot with dimension 10)

actor_pos = int(snapshot.get("actor_position", 0))

pos_vec = one_hot_position(actor_pos, max_players=10)

# 6. Number of players remaining (scale by 10)

players_remaining = np.array([float(snapshot.get("players_remaining", 0)) / 10.0])

# 7. Board cards (5 fixed cards)

board_vec = encode_board_cards(snapshot.get("board_cards", []), max_cards=5)

# 8. Actor hole cards (2 cards)

hole_cards_vec = encode_hole_cards(snapshot.get("actor_hole_cards", []))

# Concatenate all static features:

# Total dims: 4 + 1 + 3 + 1 + 10 + 1 + 260 + 104 = 384

static_features = np.concatenate([

round_vec, pot_size, blinds_vec, actor_stack, pos_vec,

players_remaining, board_vec, hole_cards_vec

])

# --- Sequential features ---

# For each previous action, we take:

# - action_type (offset by +1 so that 0 is our pad value)

# - action_sum (scaled by 100)

seq_actions = snapshot.get("previous_actions", [])

seq_types = []

seq_amounts = []

for action in seq_actions:

act_type = int(action.get("action_type", 0)) + 1 # reserve 0 for padding

act_sum = float(action.get("action_sum", 0)) / 100.0

seq_types.append(act_type)

seq_amounts.append(act_sum)

MAX_SEQ_LENGTH = 20 # maximum number of previous actions to consider

# Truncate if too long

seq_types = seq_types[:MAX_SEQ_LENGTH]

seq_amounts = seq_amounts[:MAX_SEQ_LENGTH]

# Pad sequences (pad type=0, which for action_type will be masked in the Embedding layer)

while len(seq_types) < MAX_SEQ_LENGTH:

seq_types.append(0)

seq_amounts.append(0.0)

seq_types = np.array(seq_types, dtype=np.int32)

seq_amounts = np.array(seq_amounts, dtype=np.float32).reshape((MAX_SEQ_LENGTH, 1))

# --- Target: Opponent's current action type (as integer) ---

target = int(snapshot["action"].get("action_type", 0))

return static_features, seq_types, seq_amounts, target



# =============================================================================

# 3. Load and Preprocess Data

# =============================================================================



def load_and_preprocess_data(json_filename):

"""

Load snapshot logs from a JSON file and create training arrays.

The JSON file is expected to be a list of snapshots.

"""

with open(json_filename, 'r') as f:

data = json.load(f)

static_features_list = []

seq_types_list = []

seq_amounts_list = []

targets = []

for snapshot in data:

static_feat, seq_types, seq_amounts, target = process_snapshot(snapshot)

static_features_list.append(static_feat)

seq_types_list.append(seq_types)

seq_amounts_list.append(seq_amounts)

targets.append(target)

X_static = np.stack(static_features_list) # shape: (N, 384)

X_seq_types = np.stack(seq_types_list) # shape: (N, MAX_SEQ_LENGTH)

X_seq_amounts = np.stack(seq_amounts_list) # shape: (N, MAX_SEQ_LENGTH, 1)

y = np.array(targets, dtype=np.int32) # shape: (N,)

return X_static, X_seq_types, X_seq_amounts, y



# Change the filename below to your JSON file produced by your XML parser.

JSON_FILENAME = 'logs.json'

X_static, X_seq_types, X_seq_amounts, y = load_and_preprocess_data(JSON_FILENAME)



# (Optional) Check the shapes of your training arrays:

print("X_static shape:", X_static.shape)

print("X_seq_types shape:", X_seq_types.shape)

print("X_seq_amounts shape:", X_seq_amounts.shape)

print("y shape:", y.shape)



# Split into training and validation sets

X_static_train, X_static_val, X_seq_types_train, X_seq_types_val, X_seq_amounts_train, X_seq_amounts_val, y_train, y_val = train_test_split(

X_static, X_seq_types, X_seq_amounts, y, test_size=0.2, random_state=42

)



# =============================================================================

# 4. Build the Keras Model

# =============================================================================



# Parameters for the sequential branch

MAX_SEQ_LENGTH = 20

# Adjust NUM_ACTION_TYPES based on your data (here we assume 25; update if needed)

NUM_ACTION_TYPES = 25

EMBEDDING_DIM = 16

NUM_ACTION_CLASSES = 30 # number of distinct action types to predict (update as needed)



# -- Static Input Branch --

static_input = Input(shape=(384,), name='static_input')

x_static = Dense(128, activation='relu')(static_input)

x_static = Dense(64, activation='relu')(x_static)



# -- Sequential Input Branch --

# Input for action types (integers; shape = (MAX_SEQ_LENGTH,))

seq_types_input = Input(shape=(MAX_SEQ_LENGTH,), dtype='int32', name='seq_types_input')

# Input for action amounts (floats; shape = (MAX_SEQ_LENGTH, 1))

seq_amounts_input = Input(shape=(MAX_SEQ_LENGTH, 1), dtype='float32', name='seq_amounts_input')



# Process action types with an Embedding layer.

# (We use mask_zero=True so that padded 0 values are ignored.)

x_seq_types = Embedding(

input_dim=NUM_ACTION_TYPES + 1, # +1 to reserve index 0 for padding

output_dim=EMBEDDING_DIM,

mask_zero=True,

name='action_type_embedding'

)(seq_types_input)



# Process the amounts with a simple dense layer (applied to each time step).

x_seq_amounts = Dense(8, activation='relu', name='amount_dense')(seq_amounts_input)



# Concatenate along the feature dimension: now each time step has (EMBEDDING_DIM + 8) features.

x_seq = Concatenate(name='seq_concat')([x_seq_types, x_seq_amounts])



# Process the concatenated sequence with an LSTM.

x_seq = LSTM(64, name='lstm_seq')(x_seq)



# -- Merge Both Branches --

x = Concatenate(name='merge')([x_static, x_seq])

x = Dense(64, activation='relu')(x)

x = Dropout(0.5)(x)

output = Dense(NUM_ACTION_CLASSES, activation='softmax', name='output')(x)



model = Model(

inputs=[static_input, seq_types_input, seq_amounts_input],

outputs=output

)



model.compile(

optimizer=Adam(learning_rate=1e-3),

loss='sparse_categorical_crossentropy',

metrics=['accuracy']

)



model.summary()



# =============================================================================

# 5. Train the Model

# =============================================================================



history = model.fit(

x={

'static_input': X_static_train,

'seq_types_input': X_seq_types_train,

'seq_amounts_input': X_seq_amounts_train

},

y=y_train,

validation_data=(

{

'static_input': X_static_val,

'seq_types_input': X_seq_types_val,

'seq_amounts_input': X_seq_amounts_val

},

y_val

),

epochs=20,

batch_size=32

)



# =============================================================================

# 6. Evaluate / Save the Model

# =============================================================================



loss, acc = model.evaluate(

x={

'static_input': X_static_val,

'seq_types_input': X_seq_types_val,

'seq_amounts_input': X_seq_amounts_val

},

y=y_val

)

print(f"Validation Loss: {loss:.4f} | Validation Accuracy: {acc:.4f}")



# Optionally, save your model:

model.save("opponent_model.h5")

留言
avatar-img
留言分享你的想法!
avatar-img
傑劉的沙龍
3會員
18內容數
傑劉的沙龍的其他內容
2025/03/16
記錄了對撲克數據庫程式碼的深入理解,以及如何通過精確的查詢獲得準確的分析結果。通過重新組織action type的分類,讓後續的數據分析變得更加高效。這個數據庫將是撲克機器人專案的重要組成部分,用於建立更精確的對手模型。
Thumbnail
2025/03/16
記錄了對撲克數據庫程式碼的深入理解,以及如何通過精確的查詢獲得準確的分析結果。通過重新組織action type的分類,讓後續的數據分析變得更加高效。這個數據庫將是撲克機器人專案的重要組成部分,用於建立更精確的對手模型。
Thumbnail
2025/03/14
記錄了在建構撲克數據庫過程中遇到的挑戰和收穫。探討了自建系統與現成工具的差異,以及如何確保數據準確性。同時反思了精確表達查詢需求的重要性,以及自建系統潛在的長期價值。
Thumbnail
2025/03/14
記錄了在建構撲克數據庫過程中遇到的挑戰和收穫。探討了自建系統與現成工具的差異,以及如何確保數據準確性。同時反思了精確表達查詢需求的重要性,以及自建系統潛在的長期價值。
Thumbnail
2025/03/13
記錄了在撲克機器人開發中從機器學習模型轉向建立自定義數據庫的過程,以及這個策略轉變背後的思考。通過分析真實玩家的行動分布,希望能訓練出更有效的撲克機器人。
Thumbnail
2025/03/13
記錄了在撲克機器人開發中從機器學習模型轉向建立自定義數據庫的過程,以及這個策略轉變背後的思考。通過分析真實玩家的行動分布,希望能訓練出更有效的撲克機器人。
Thumbnail
看更多
你可能也想看
Thumbnail
「欸!這是在哪裡買的?求連結 🥺」 誰叫你太有品味,一發就讓大家跟著剁手手? 讓你回購再回購的生活好物,是時候該介紹出場了吧! 「開箱你的美好生活」現正召喚各路好物的開箱使者 🤩
Thumbnail
「欸!這是在哪裡買的?求連結 🥺」 誰叫你太有品味,一發就讓大家跟著剁手手? 讓你回購再回購的生活好物,是時候該介紹出場了吧! 「開箱你的美好生活」現正召喚各路好物的開箱使者 🤩
Thumbnail
記錄了在撲克機器人開發中從機器學習模型轉向建立自定義數據庫的過程,以及這個策略轉變背後的思考。通過分析真實玩家的行動分布,希望能訓練出更有效的撲克機器人。
Thumbnail
記錄了在撲克機器人開發中從機器學習模型轉向建立自定義數據庫的過程,以及這個策略轉變背後的思考。通過分析真實玩家的行動分布,希望能訓練出更有效的撲克機器人。
Thumbnail
記錄了在開發過程中與LLM合作的經驗教訓,以及在資料處理和模型設計上的一些思考。特別強調了在開發過程中,有時看似繁瑣的基礎工作反而是最重要的。
Thumbnail
記錄了在開發過程中與LLM合作的經驗教訓,以及在資料處理和模型設計上的一些思考。特別強調了在開發過程中,有時看似繁瑣的基礎工作反而是最重要的。
Thumbnail
記錄了在開發撲克機器人對手模型時,如何與語言模型協作的心得,以及在這過程中對開發方法論的一些思考。特別強調了「先求有,再求好」的重要性,以及如何在保持開發效率和深入理解技術細節之間找到平衡。
Thumbnail
記錄了在開發撲克機器人對手模型時,如何與語言模型協作的心得,以及在這過程中對開發方法論的一些思考。特別強調了「先求有,再求好」的重要性,以及如何在保持開發效率和深入理解技術細節之間找到平衡。
Thumbnail
記錄了在開發撲克機器人時處理數據的心得,從最初對XML格式感到困惑,到現在能夠從容面對數據處理的轉變過程。反思了數據品質在機器學習project中的重要性,以及自己在程式開發能力上的進步
Thumbnail
記錄了在開發撲克機器人時處理數據的心得,從最初對XML格式感到困惑,到現在能夠從容面對數據處理的轉變過程。反思了數據品質在機器學習project中的重要性,以及自己在程式開發能力上的進步
Thumbnail
記錄了在分析200萬筆撲克歷史記錄時的思考過程,從最初被 Q-Learning 吸引,到理解其在不完整資訊遊戲中的侷限,最終決定轉向建立對手模型系統的過程。反映了在技術選擇時,如何在炫酷與實用之間找到平衡。
Thumbnail
記錄了在分析200萬筆撲克歷史記錄時的思考過程,從最初被 Q-Learning 吸引,到理解其在不完整資訊遊戲中的侷限,最終決定轉向建立對手模型系統的過程。反映了在技術選擇時,如何在炫酷與實用之間找到平衡。
Thumbnail
記錄了在開發撲克機器人時,從對機器學習模型的成功驗證,到意識到自己又回到solver策略老路的過程。最終決定改變方向,轉向分析實戰數據的心路歷程。
Thumbnail
記錄了在開發撲克機器人時,從對機器學習模型的成功驗證,到意識到自己又回到solver策略老路的過程。最終決定改變方向,轉向分析實戰數據的心路歷程。
Thumbnail
記錄了放棄使用大型語言模型作為撲克機器人核心的決定過程,以及新的混合策略方案的構思。文章探討了技術選擇的考量因素,並回顧了過去開發經驗帶來的啟發。
Thumbnail
記錄了放棄使用大型語言模型作為撲克機器人核心的決定過程,以及新的混合策略方案的構思。文章探討了技術選擇的考量因素,並回顧了過去開發經驗帶來的啟發。
追蹤感興趣的內容從 Google News 追蹤更多 vocus 的最新精選內容追蹤 Google News