重啟撲克機器人之路 -12:當簡化反而帶來更多可能

重啟撲克機器人之路 -12:當簡化反而帶來更多可能

更新於 發佈於 閱讀時間約 26 分鐘
raw-image

昨天在改進撲克資料的提取過程時,有了一些有趣的領悟。原本想著要把所有細節都考慮進去,結果反而讓自己陷入了一個數據過於複雜的困境。特別是在處理raise action的部分,之前總覺得每個不同的下注大小都應該被當作獨特的數據點來訓練,現在回想起來,這種過度精確的想法反而限制了模型的學習空間。

這讓我想起以前用Hand2Note分析玩家池的經驗 ,在數學以及理論上,30%跟50%的下注差異滿多的,但在實戰中,玩家的range其實並沒有太大的區別。於是我決定將各種下注大小重新分類,簡化成幾個主要類別:小注、中注、大注等。這個決定不僅讓數據更容易處理,更重要的是讓每個類別都有足夠的樣本來進行有意義的分析。

在整理數據的過程中也發現了一些之前忽略的問題。比如pot size的計算方式有誤 - 之前用的是牌局開始時的金額,而忽略已經投入底池的錢。還有玩家ID的部分,原先直接使用玩家名稱導致數據不夠一致,現在改用簡單的ID編號後,數據的一致性提高了不少。

原本打算使用Random Forest Classification來處理這project,感覺這樣可以讓整個流程變得更直接簡單。然而在檢視訓練數據時,意識到poker中previous action的重要性可能遠超過我的預期。這讓我開始思考是否需要一個能更好地處理序列數據的模型。

在與語言模型討論後,決定嘗試將特徵分成兩個分支:一個處理全局遊戲狀態,另一個專門處理動作序列。這個新方案使用了LSTM網絡來處理序列資料,雖然我還不太理解它的運作原理,但至少模型可以開始訓練了,這總是個好的起點。

回顧昨天的進展,發現有時候「簡化」反而能帶來更多可能性。就像之前用OpenHoldem時,總是想要處理所有可能的情況,結果反而讓程式變得難以維護。現在學會了如何在保持核心功能的同時,適度簡化實作方式,這或許才是真正的進步。

接下來還需要深入研究LSTM網絡的原理,雖然現在還是一頭霧水,但至少方向是對的。就像之前寫過的,有時候不需要完全理解所有細節,先讓東西能跑起來,再慢慢改進可能是更好的策略。

更新後的手牌資料提取,適用於iPoker xml hand history:

#!/usr/bin/env python3

import os

import xml.etree.ElementTree as ET

import json

import re



# ------------------------------

# Helper functions

# ------------------------------



def safe_float(text):

try:

return float(re.sub(r"[^\d\.]", "", text))

except Exception:

return 0.0



def street_from_round(round_no):

return {1: "preflop", 2: "flop", 3: "turn", 4: "river"}.get(round_no, "unknown")



def get_hole_cards(game, player):

for r in game.findall('round'):

for elem in r.findall('cards'):

if elem.attrib.get("type") == "Pocket" and elem.attrib.get("player") == player:

if elem.text:

cards = elem.text.split()

if any(c.upper().startswith("X") for c in cards):

return ["unknown", "unknown"]

else:

return cards

return ["unknown", "unknown"]



def simplify_action(action_details, round_no, blinds, pot_before_action):

"""

Simplify the action into a single string category.

Allowed original types:

- 0 → fold

- 3 → call

- 4 → check

- 5, 7, 23 → raise, which are further categorized as follows:

* Preflop (round 1), using the big blind as reference:

2.0×BB to 2.3×BB → "small raise preflop"

2.3×BB to 3.0×BB → "mid raise preflop"

3.0×BB to 4.0×BB → "big raise preflop"

≥ 4.0×BB → "all in preflop"

* Postflop (round > 1), using the pot (before the action) as reference:

< 35% of pot → "small raise postflop"

35% to 70% of pot → "mid raise postflop"

70% to 110% of pot → "big raise postflop"

≥ 110% of pot → "all in postflop"

"""

allowed_types = {0: "fold", 3: "call", 4: "check", 5: "raise", 7: "raise", 23: "raise"}

orig_type = action_details['action_type']

if orig_type not in allowed_types:

return None

base_action = allowed_types[orig_type]

new_action = action_details.copy()

if base_action != "raise":

new_action["simple_action_type"] = base_action

else:

if round_no == 1:

bb = blinds.get("big_blind", 1)

ratio = action_details['action_sum'] / bb if bb != 0 else 0

if ratio < 2.3:

new_action["simple_action_type"] = "small raise preflop"

elif 2.3 <= ratio < 3.0:

new_action["simple_action_type"] = "mid raise preflop"

elif 3.0 <= ratio < 4.0:

new_action["simple_action_type"] = "big raise preflop"

elif ratio >= 4.0:

new_action["simple_action_type"] = "all in preflop"

else:

new_action["simple_action_type"] = "raise"

else:

ratio = action_details['action_sum'] / pot_before_action if pot_before_action > 0 else 0

if ratio < 0.35:

new_action["simple_action_type"] = "small raise postflop"

elif ratio < 0.70:

new_action["simple_action_type"] = "mid raise postflop"

elif ratio < 1.10:

new_action["simple_action_type"] = "big raise postflop"

else:

new_action["simple_action_type"] = "all in postflop"

return new_action



def parse_decision_logs(root, hero):

logs = []

for game in root.findall('game'):

gamecode = game.attrib.get("gamecode", "")

general = game.find('general')

blinds = {

'small_blind': safe_float(general.findtext('smallblind', default="0")),

'big_blind': safe_float(general.findtext('bigblind', default="0")),

'ante': safe_float(general.findtext('ante', default="0"))

}

# Get players element and determine initial stacks.

players_elem = general.find('players')

# Identify small blind and big blind using round 0 actions.

small_blind_player = None

big_blind_player = None

for r in game.findall('round'):

if int(r.attrib.get('no', 0)) == 0:

for child in r:

if child.tag == "action":

if child.attrib.get('type') == "1":

small_blind_player = child.attrib.get('player')

elif child.attrib.get('type') == "2":

big_blind_player = child.attrib.get('player')

break

# Identify the button player from the dealer flag.

button_player = None

for p in players_elem.findall('player'):

if p.attrib.get('dealer','0') == '1':

button_player = p.attrib.get('name')

break

# Create mapping: small blind → 0, big blind → 1, button → 2.

player_positions = {}

if small_blind_player is not None:

player_positions[small_blind_player] = 0

if big_blind_player is not None:

player_positions[big_blind_player] = 1

if button_player is not None:

player_positions[button_player] = 2

# Get initial stacks for the mapped players.

player_stacks = {}

for p in players_elem.findall('player'):

name = p.attrib['name']

if name in player_positions:

player_stacks[name] = safe_float(p.attrib.get('chips', "0"))

active_players = {name: True for name in player_positions}

pot_size = 0.0

board_cards = []

cumulative_actions = [] # simplified actions (each includes "action_round")

snapshot_action_counter = 0

# Track cumulative contributions from each player.

player_contributions = {name: 0.0 for name in player_positions}

for r in game.findall('round'):

round_no = int(r.attrib.get('no', 0))

for child in r:

if child.tag == "cards":

if child.attrib.get("type") != "Pocket":

if child.text:

board_cards.extend(child.text.split())

elif child.tag == "action":

action_details = {

'player': child.attrib.get('player'),

'action_type': int(child.attrib.get('type')),

'action_sum': safe_float(child.attrib.get('sum')),

'action_round': round_no

}

player = action_details['player']

contribution = action_details['action_sum']

# For round 0 (blinds/antes): update contributions and pot; no snapshot.

if round_no < 1:

if player in player_contributions:

player_contributions[player] += contribution

pot_size += contribution

continue

# For rounds ≥ 1, capture current pot (before adding current action).

current_pot = pot_size

# Actor's current stack (as seen when taking the action).

actor_current_stack = player_stacks.get(player, 0.0) - player_contributions.get(player, 0.0)

# Simplify the action.

simple_action = simplify_action(action_details, round_no, blinds, current_pot)

if simple_action is not None:

snapshot_action_counter += 1

simple_action['action_no'] = snapshot_action_counter

# Replace the "player" field with its numeric value.

numeric_player = player_positions.get(player)

simple_action["player"] = numeric_player

# Build current stacks keyed by numeric positions.

current_player_stacks = { player_positions[name] : (player_stacks[name] - player_contributions.get(name, 0.0))

for name in player_positions }

# Create snapshot only for non-hero actions.

if player != hero:

snapshot = {

"gamecode": gamecode,

"round_no": round_no,

"current_street": street_from_round(round_no),

"blinds": blinds,

"player_positions": player_positions, # mapping from name to number

"player_stacks": current_player_stacks, # mapping from number to current stack

"pot_size": current_pot,

"board_cards": board_cards.copy(),

"previous_actions": [],

"action": simple_action.copy(),

"players_remaining": sum(1 for v in active_players.values() if v),

"is_button": (player_positions.get(player) == 2),

"actor_hole_cards": get_hole_cards(game, player),

"actor_stack_size": actor_current_stack,

"actor_position": player_positions.get(player)

}

# For previous actions, use the numeric "player" already set.

for act in cumulative_actions:

act2 = act.copy()

# Since act["player"] is now numeric, we simply assign it.

act2["player_position"] = act["player"]

snapshot["previous_actions"].append(act2)

logs.append(snapshot)

# Update contributions and pot AFTER snapshot creation.

if player in player_contributions:

player_contributions[player] += contribution

pot_size += contribution

if simple_action is not None:

cumulative_actions.append(simple_action)

# Mark a player as inactive if they folded.

if action_details.get('action_type') == 0:

active_players[player] = False

return logs



# ------------------------------

# Process All XML Files in ipoker_hh Folder

# ------------------------------



def process_all_hand_history(root_folder, hero=None):

all_logs = []

for dirpath, dirnames, filenames in os.walk(root_folder):

for filename in filenames:

if filename.endswith(".xml"):

file_path = os.path.join(dirpath, filename)

try:

tree = ET.parse(file_path)

root_xml = tree.getroot()

if hero is None:

session_general = root_xml.find('general')

if session_general is not None and session_general.find('nickname') is not None:

hero = session_general.find('nickname').text.strip()

logs = parse_decision_logs(root_xml, hero)

all_logs.extend(logs)

print(f"Processed file: {file_path} -> {len(logs)} snapshots.")

except ET.ParseError as e:

print(f"Error parsing XML file: {file_path}", e)

return all_logs



# ------------------------------

# Main

# ------------------------------



if __name__ == '__main__':

root_folder = "ipoker_hh_test" # Adjust folder path as needed.

hero_name = None # Or manually set your hero's name.

print("Processing all hand history XML files in folder:", root_folder)

all_logs = process_all_hand_history(root_folder, hero=hero_name)

print("Total snapshots extracted:", len(all_logs))

with open("logs.json", "w") as outfile:

json.dump(all_logs, outfile, indent=4)

print("Data extraction complete. Saved to logs.json")


avatar-img
傑劉的沙龍
3會員
18內容數
留言
avatar-img
留言分享你的想法!
傑劉的沙龍 的其他內容
記錄了對撲克數據庫程式碼的深入理解,以及如何通過精確的查詢獲得準確的分析結果。通過重新組織action type的分類,讓後續的數據分析變得更加高效。這個數據庫將是撲克機器人專案的重要組成部分,用於建立更精確的對手模型。
記錄了在建構撲克數據庫過程中遇到的挑戰和收穫。探討了自建系統與現成工具的差異,以及如何確保數據準確性。同時反思了精確表達查詢需求的重要性,以及自建系統潛在的長期價值。
記錄了在撲克機器人開發中從機器學習模型轉向建立自定義數據庫的過程,以及這個策略轉變背後的思考。通過分析真實玩家的行動分布,希望能訓練出更有效的撲克機器人。
記錄了對撲克數據庫程式碼的深入理解,以及如何通過精確的查詢獲得準確的分析結果。通過重新組織action type的分類,讓後續的數據分析變得更加高效。這個數據庫將是撲克機器人專案的重要組成部分,用於建立更精確的對手模型。
記錄了在建構撲克數據庫過程中遇到的挑戰和收穫。探討了自建系統與現成工具的差異,以及如何確保數據準確性。同時反思了精確表達查詢需求的重要性,以及自建系統潛在的長期價值。
記錄了在撲克機器人開發中從機器學習模型轉向建立自定義數據庫的過程,以及這個策略轉變背後的思考。通過分析真實玩家的行動分布,希望能訓練出更有效的撲克機器人。