這兩天花了不少時間重新檢視數據庫的程式碼,與過去不同的是,我嘗試耐心地一行一行地深入理解。那種線索逐漸串連、邏輯關係逐漸浮現的感覺真的很奇妙,就像拼圖一樣,越看越能理解整個系統是如何運作的。
SQL語言就如我之前所說,確實是相當直觀的。而且語言模型已經幫我處理了最困難的部分,比如如何連接數據庫、如何進行表格關聯、以及如何對數據進行篩選。當我仔細閱讀這些代碼,我認為它們已經做得相當不錯。雖然我在建構這類數據庫上並沒有太多經驗,也不是什麼專家,但代碼結構清晰地展示了數據如何存儲以及如何被檢索,這讓整個系統變得易於理解。
接著我開始將我的數據庫結果與Hand2Note的分析進行比較。過程中我意識到,在提出查詢時需要格外精確。例如,當我要求分析14-18BB籌碼量的heads-up情況時,我需要明確指定這是指「有效大盲」,也就是兩位玩家中較小的籌碼量。以前如果我不那麼明確地提出要求,系統可能會混入一些不符合條件的情況——比如小盲位置的玩家擁有14-18BB的籌碼,但大盲位置的玩家可能只有8BB或10BB。在這種情況下,有效籌碼量實際上是由大盲玩家的籌碼量決定的。
同時,我還需要明確排除hero的行動,並且指定我只需要first to act的數據。經過這些精確的篩選後,結果與Hand2Note的分析幾乎一致,這讓我對自己構建的系統更有信心了。
Heads-Up SB First Action Analysis (18-24bb Stack) - EXCLUDING HERO =================================================================
Total hands analyzed: 68313 Action frequencies:
Limp : 32205 hands ( 47.1%)
RFI : 20183 hands ( 29.5%)
Fold : 11926 hands ( 17.5%)
OpenShove : 3999 hands ( 5.8%)
Heads-Up SB First Action Analysis (14-18bb Stack) - EXCLUDING HERO =================================================================
Total hands analyzed: 81702 Action frequencies:
Limp : 40651 hands ( 49.8%)
RFI : 19610 hands ( 24.0%)
Fold : 14623 hands ( 17.9%)
OpenShove : 6818 hands ( 8.3%)
我也重新組織了action type的分類。以前我將所有的raise動作都歸為同一類,用5或23這樣的數字代碼表示。但現在我發現5實際上代表的是bet動作,而不是raise。這樣的區分讓後續對數據庫的篩選變得更加容易。例如,我可以直接篩選出「下注」(bet)這類動作,而不需要通過檢查之前的動作序列來確保不存在之前的下注。
這種區分對於分析一些特定的撲克策略非常有用,比如continuation bet(持續下注)、delay continuation bet(延遲持續下注)。這些策略在撲克中的統計數據都非常重要,而它們都可以被歸類為「下注」動作。通過將bet和raise明確分開,之後的分析工作將變得更加簡單高效。
這個數據庫將是撲克機器人專案中的重要組成部分,因為它能提供對手模型所需的行動頻率和手牌分佈數據。這樣我就能建立一個模擬平均玩家行為的模型。雖然我還不完全確定具體如何實現,但如果我能獲取完整的決策樹狀態,我相信能找到一些算法來確定針對特定對手的最佳行動。
回想幾年前我使用Hand2Note分析玩家池時,確實發現了一些驚人的模式。許多玩家在某些地方表現相當糟糕,這使得我們可以通過更有針對性的策略來獲取優勢。例如,在Head Up的BU上,平均玩家的跟注(limp)範圍實在太容易被針對了。儘管現在已經有很多"完美翻牌前範圍"可供參考,但問題在於這些都是混合策略,執行起來相當困難。因此,普通對手仍然會用多數弱牌進行跟注,只有少部分強牌會混合使用。面對這樣的對手,我可以非常積極地給予壓力,從中獲利。
而這個新的機器人項目,我期待它能比我過去使用的CardRunner EV等工具更有效地計算和利用這些弱點,特別是在處理翻牌後的複雜情況時。我真的很期待看到這個數據驅動的機器人如何應對這些玩家類型,因為我想像它能找到更好的方法來利用這些策略弱點。
更新後的database程式碼:
import os
import sqlite3
import pandas as pd
import numpy as np
from tqdm import tqdm
import time
import re
import json
import traceback
from concurrent.futures import ThreadPoolExecutor
from lxml import etree as ET
# For reproducibility
np.random.seed(42)
def optimize_sqlite_connection(conn):
"""Apply performance optimizations to SQLite connection"""
conn.execute('PRAGMA journal_mode = WAL')
conn.execute('PRAGMA synchronous = NORMAL')
conn.execute('PRAGMA cache_size = -64000') # 64MB cache
conn.execute('PRAGMA temp_store = MEMORY')
conn.execute('PRAGMA page_size = 4096')
def create_database_schema(db_path, with_indexes=True):
"""Create the database schema for poker hand history analysis"""
conn = sqlite3.connect(db_path)
optimize_sqlite_connection(conn)
c = conn.cursor()
# Create tables without indexes first for faster bulk loading
c.execute('''
CREATE TABLE IF NOT EXISTS games (
game_id TEXT PRIMARY KEY,
session_id TEXT,
start_date TEXT,
small_blind REAL,
big_blind REAL,
ante REAL,
table_name TEXT,
player_count INTEGER,
is_tournament INTEGER
)
''')
c.execute('''
CREATE TABLE IF NOT EXISTS players (
player_id INTEGER PRIMARY KEY AUTOINCREMENT,
player_name TEXT UNIQUE
)
''')
c.execute('''
CREATE TABLE IF NOT EXISTS game_players (
game_id TEXT,
player_id INTEGER,
position TEXT,
position_numeric INTEGER,
initial_stack REAL,
is_hero INTEGER,
is_dealer INTEGER,
final_stack REAL,
total_won REAL,
total_bet REAL,
PRIMARY KEY (game_id, player_id),
FOREIGN KEY (game_id) REFERENCES games(game_id),
FOREIGN KEY (player_id) REFERENCES players(player_id)
)
''')
c.execute('''
CREATE TABLE IF NOT EXISTS cards (
card_id INTEGER PRIMARY KEY AUTOINCREMENT,
game_id TEXT,
card_type TEXT, -- Pocket, Flop, Turn, River
player_id INTEGER NULL,
card_values TEXT,
FOREIGN KEY (game_id) REFERENCES games(game_id),
FOREIGN KEY (player_id) REFERENCES players(player_id)
)
''')
c.execute('''
CREATE TABLE IF NOT EXISTS actions (
action_id INTEGER PRIMARY KEY AUTOINCREMENT,
game_id TEXT,
player_id INTEGER,
action_round INTEGER,
action_type INTEGER,
simple_action_type TEXT,
action_sum REAL,
action_order INTEGER,
pot_before_action REAL,
players_remaining INTEGER,
FOREIGN KEY (game_id) REFERENCES games(game_id),
FOREIGN KEY (player_id) REFERENCES players(player_id)
)
''')
# Only create indexes if requested (defer for bulk loading)
if with_indexes:
c.execute('CREATE INDEX IF NOT EXISTS idx_game_id ON actions(game_id)')
c.execute('CREATE INDEX IF NOT EXISTS idx_player_id ON actions(player_id)')
c.execute('CREATE INDEX IF NOT EXISTS idx_action_type ON actions(simple_action_type)')
c.execute('CREATE INDEX IF NOT EXISTS idx_game_players ON game_players(game_id, player_id)')
conn.commit()
conn.close()
print(f"Database schema created at {db_path}")
# Helper functions
def safe_float(text):
"""Convert text to float, handling currency symbols and errors"""
if text is None:
return 0.0
try:
return float(re.sub(r"[^\d\.]", "", str(text)))
except Exception:
return 0.0
def get_action_type(type_code):
"""Convert action type code to readable format"""
action_types = {
0: "fold",
1: "small_blind",
2: "big_blind",
3: "call",
4: "check",
5: "bet",
7: "all_in",
15: "ante",
23: "raise"
}
return action_types.get(int(type_code) if type_code else 0, f"unknown_{type_code}")
def simplify_action(action_details, round_no, blinds, pot_before_action, round_contributions, current_round_max, player_stacks=None):
"""
Simplify the action into a single string category based on context.
"""
allowed_types = {0: "fold", 3: "call", 4: "check", 5: "bet", 7: "all_in", 15: "ante", 23: "raise"}
orig_type = action_details['action_type']
if orig_type not in allowed_types:
return None
base_action = allowed_types[orig_type]
new_action = action_details.copy()
if base_action == "all_in":
# Determine if this is an all-in call or all-in raise
player_id = action_details['player_id']
current_player_contrib = round_contributions.get(player_id, 0)
# If the action sum is less than or equal to what would be needed to call,
# it's an all-in call. Otherwise, it's an all-in raise.
call_amount = max(0, current_round_max - current_player_contrib)
if action_details['action_sum'] <= call_amount * 1.05: # Allow small margin for rounding errors
new_action["simple_action_type"] = "all_in_call"
else:
# It's an all-in raise
if round_no == 1: # Preflop
new_action["simple_action_type"] = "all_in_preflop"
else:
new_action["simple_action_type"] = "all_in_postflop"
elif base_action == "bet":
# Bets only occur postflop (rounds 2+)
if round_no > 1:
effective_pot = pot_before_action
ratio = action_details['action_sum'] / effective_pot if effective_pot > 0 else 0
if ratio <= 0.33:
new_action["simple_action_type"] = "small_bet_postflop"
elif ratio <= 0.66:
new_action["simple_action_type"] = "mid_bet_postflop"
else:
new_action["simple_action_type"] = "big_bet_postflop"
else:
# This shouldn't happen in normal play (a bet in pre-flop)
new_action["simple_action_type"] = "unusual_bet_preflop"
elif base_action == "raise":
if round_no == 1: # Preflop
bb = blinds.get("big_blind", 1)
ratio = action_details['action_sum'] / bb if bb != 0 else 0
# Get player's stack size (if available)
player_id = action_details['player_id']
player_stack = player_stacks.get(player_id, float('inf')) if player_stacks else float('inf')
# Check if this is an all-in
is_all_in = action_details['action_sum'] >= 0.98 * player_stack
if is_all_in:
new_action["simple_action_type"] = "all_in_preflop"
elif ratio <= 2.2:
new_action["simple_action_type"] = "small_raise_preflop"
elif ratio <= 2.7:
new_action["simple_action_type"] = "mid_raise_preflop"
elif ratio <= 3.2:
new_action["simple_action_type"] = "big_raise_preflop"
else:
new_action["simple_action_type"] = "consider_all_in_preflop"
else: # Postflop
current_player_contrib = round_contributions.get(action_details['player_id'], 0)
call_amount = max(0, current_round_max - current_player_contrib)
effective_raise = action_details['action_sum'] - call_amount
effective_pot = pot_before_action + call_amount
ratio = effective_raise / effective_pot if effective_pot > 0 else 0
if ratio <= 0.33:
new_action["simple_action_type"] = "small_raise_postflop"
elif ratio <= 0.66:
new_action["simple_action_type"] = "mid_raise_postflop"
else:
new_action["simple_action_type"] = "big_raise_postflop"
else:
new_action["simple_action_type"] = base_action
return new_action
def process_xml_file(xml_file_path):
"""Process a single XML file and return extracted data"""
try:
# Data structures to collect results
games_data = []
game_players_data = []
cards_data = []
actions_data = []
player_names = set()
try:
# Use safer XML parsing approach
parser = ET.XMLParser(recover=True)
tree = ET.parse(xml_file_path, parser)
root = tree.getroot()
except Exception as e:
return {
'success': False,
'error': f"XML parsing error: {str(e)}",
'filename': xml_file_path
}
# Get session info
session_id = root.get('sessioncode', '')
session_general = root.find('general')
hero_nickname = session_general.findtext('nickname', '') if session_general is not None else ''
# Process each game
for game in root.findall('game'):
game_id = game.get('gamecode', '')
if not game_id:
continue
# Parse general game info
general = game.find('general')
if general is None:
continue
# Get blinds and ante
small_blind = safe_float(general.findtext('smallblind', "0"))
big_blind = safe_float(general.findtext('bigblind', "0"))
ante = safe_float(general.findtext('ante', "0"))
# Get start date
start_date = general.findtext('startdate', '')
# Get table info
table_name = ''
is_tournament = 0
for parent_general in root.findall('general'):
table_name = parent_general.findtext('tablename', '')
is_tournament = 1 if parent_general.findtext('tournamentcode', '') else 0
# Get player count
players_elem = general.find('players')
if players_elem is None:
continue
player_count = len(players_elem.findall('player'))
# Store game info
games_data.append({
'game_id': game_id,
'session_id': session_id,
'start_date': start_date,
'small_blind': small_blind,
'big_blind': big_blind,
'ante': ante,
'table_name': table_name,
'player_count': player_count,
'is_tournament': is_tournament
})
# Process player positions
player_positions = {}
position_numeric = {}
# Find button player
button_player = None
for p in players_elem.findall('player'):
if p.get('dealer', '0') == '1':
button_player = p.get('name')
break
# Find SB and BB from round 0 actions
sb_player = None
bb_player = None
for r in game.findall('round'):
if r.get('no') == '0':
for act in r.findall('action'):
action_type = act.get('type')
player = act.get('player')
if action_type == '1':
sb_player = player
elif action_type == '2':
bb_player = player
# Assign positions
heads_up = (player_count == 2)
# For 2-player games
if heads_up and button_player and sb_player:
player_positions[button_player] = 'BTN/SB'
position_numeric[button_player] = 0
# In heads-up, the other player must be BB
for p in players_elem.findall('player'):
player_name = p.get('name')
if player_name != button_player:
player_positions[player_name] = 'BB'
position_numeric[player_name] = 1
break
# For 3+ player games
else:
if sb_player:
player_positions[sb_player] = 'SB'
position_numeric[sb_player] = 0
if bb_player:
player_positions[bb_player] = 'BB'
position_numeric[bb_player] = 1
# In 3-player games, the third position is button if not SB or BB
if player_count == 3 and button_player:
if button_player not in (sb_player, bb_player):
player_positions[button_player] = 'BTN'
position_numeric[button_player] = 2
# Process each player
for p in players_elem.findall('player'):
player_name = p.get('name')
if not player_name:
continue
# Add to set of player names
player_names.add(player_name)
initial_stack = safe_float(p.get('chips', '0'))
is_hero = 1 if player_name == hero_nickname else 0
is_dealer = 1 if p.get('dealer', '0') == '1' else 0
total_won = safe_float(p.get('win', '0'))
total_bet = safe_float(p.get('bet', '0'))
final_stack = initial_stack + total_won - total_bet
position = player_positions.get(player_name, 'unknown')
pos_num = position_numeric.get(player_name, -1)
# Store player info for this game
game_players_data.append({
'game_id': game_id,
'player_name': player_name,
'position': position,
'position_numeric': pos_num,
'initial_stack': initial_stack,
'is_hero': is_hero,
'is_dealer': is_dealer,
'final_stack': final_stack,
'total_won': total_won,
'total_bet': total_bet
})
# Process cards
for r in game.findall('round'):
for card in r.findall('cards'):
card_type = card.get('type')
player_name = card.get('player', None)
card_values = card.text.strip() if card.text else ''
# Store card info
cards_data.append({
'game_id': game_id,
'card_type': card_type,
'player_name': player_name,
'card_values': card_values
})
# Process actions with contextualized information
active_players = {p.get('name'): True for p in players_elem.findall('player')}
pot_size = 0.0
action_order = 0
# Track cumulative contributions
player_contributions = {p.get('name'): 0.0 for p in players_elem.findall('player')}
player_stacks = {p.get('name'): safe_float(p.get('chips', '0')) for p in players_elem.findall('player')}
for r in game.findall('round'):
round_no = int(r.get('no', '0'))
# Reset round contributions and maximum for new betting rounds
if round_no >= 1:
round_contributions = {player_name: 0.0 for player_name in player_contributions}
current_round_max = 0.0
for action in r.findall('action'):
player_name = action.get('player')
if not player_name:
continue
action_type = int(action.get('type', '0'))
action_sum = safe_float(action.get('sum', '0'))
action_order += 1
# Update player's stack
player_stacks[player_name] = max(0, player_stacks.get(player_name, 0) - action_sum)
# Prepare action details
action_details = {
'player_id': player_name, # We'll resolve player_id later
'action_type': action_type,
'action_sum': action_sum,
'action_round': round_no
}
current_pot = pot_size
players_remaining = sum(1 for v in active_players.values() if v)
simple_action_type = get_action_type(action_type)
# Skip round 0 for simplification analysis
# For rounds ≥ 1, analyze actions in context
if round_no >= 1:
# Simplify the action using the round-level data
simple_action = simplify_action(
action_details,
round_no,
{"big_blind": big_blind, "small_blind": small_blind, "ante": ante},
current_pot,
round_contributions,
current_round_max,
player_stacks # Pass player stacks here
)
if simple_action is not None:
# Store the action with context
actions_data.append({
'game_id': game_id,
'player_name': player_name,
'action_round': round_no,
'action_type': action_type,
'simple_action_type': simple_action.get("simple_action_type", simple_action_type),
'action_sum': action_sum,
'action_order': action_order,
'pot_before_action': current_pot,
'players_remaining': players_remaining
})
# Update round-level contributions for this action
round_contributions[player_name] = round_contributions.get(player_name, 0) + action_sum
current_round_max = max(current_round_max, round_contributions[player_name])
else:
# For round 0 (blinds/antes), just record the action
actions_data.append({
'game_id': game_id,
'player_name': player_name,
'action_round': round_no,
'action_type': action_type,
'simple_action_type': simple_action_type,
'action_sum': action_sum,
'action_order': action_order,
'pot_before_action': pot_size,
'players_remaining': player_count
})
# Update cumulative contributions and pot
player_contributions[player_name] = player_contributions.get(player_name, 0) + action_sum
pot_size += action_sum
# Mark a player as inactive if they folded
if action_type == 0: # fold
active_players[player_name] = False
return {
'games': games_data,
'game_players': game_players_data,
'cards': cards_data,
'actions': actions_data,
'players': list(player_names),
'success': True
}
except Exception as e:
print(f"Error processing {xml_file_path}: {str(e)}")
traceback.print_exc()
return {
'games': [],
'game_players': [],
'cards': [],
'actions': [],
'players': [],
'success': False,
'error': str(e),
'filename': xml_file_path
}
def get_player_id_cache(conn, player_names):
"""Pre-cache player IDs to avoid repeated lookups"""
cursor = conn.cursor()
player_id_cache = {}
# Get existing players
cursor.execute("SELECT player_id, player_name FROM players")
for player_id, player_name in cursor.fetchall():
player_id_cache[player_name] = player_id
# Insert any new players all at once
new_players = [(name,) for name in player_names if name not in player_id_cache]
if new_players:
cursor.executemany("INSERT OR IGNORE INTO players (player_name) VALUES (?)", new_players)
conn.commit()
# Get the newly inserted IDs
for name, in new_players:
if name not in player_id_cache:
cursor.execute("SELECT player_id FROM players WHERE player_name = ?", (name,))
result = cursor.fetchone()
if result:
player_id_cache[name] = result[0]
return player_id_cache
def process_directory(directory_path, db_path, limit=None):
"""Process all XML files in a directory and store them in the database"""
start_time = time.time()
# Get all XML files
xml_files = []
for root, _, files in os.walk(directory_path):
for file in files:
if file.endswith(".xml"):
xml_files.append(os.path.join(root, file))
# Limit files if requested
if limit and limit > 0:
if limit < len(xml_files):
xml_files = xml_files[:limit]
print(f"Found {len(xml_files)} XML files to process")
# Create database schema without indexes for faster insertion
if not os.path.exists(db_path):
create_database_schema(db_path, with_indexes=False)
# Set up database connection in main thread
conn = sqlite3.connect(db_path)
optimize_sqlite_connection(conn)
# Check for existing games to avoid duplicates
cursor = conn.cursor()
cursor.execute("SELECT game_id FROM games")
existing_games = {row[0] for row in cursor.fetchall()}
# Determine optimal chunk size and number of workers
chunk_size = 50 # A smaller batch size for better stability
num_files = len(xml_files)
num_chunks = max(1, (num_files + chunk_size - 1) // chunk_size)
# Use thread-based parallelism (more reliable on Windows)
max_workers = min(os.cpu_count() * 2, 16) # More threads since they're lighter weight
print(f"Processing with {max_workers} threads in {num_chunks} chunks")
# Set up progress tracking
processed_files = 0
successful_files = 0
total_games = 0
total_actions = 0
# Process files in batches
for i in range(0, num_files, chunk_size):
chunk = xml_files[i:i+chunk_size]
chunk_start = time.time()
print(f"Processing chunk {i//chunk_size + 1}/{num_chunks} ({len(chunk)} files)")
# Process files with thread pool
results = []
try:
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(process_xml_file, file_path): file_path for file_path in chunk}
# Use tqdm for progress tracking
with tqdm(total=len(chunk), desc="Processing XML files") as pbar:
for future in futures:
try:
result = future.result()
results.append(result)
if result['success']:
successful_files += 1
pbar.update(1)
except Exception as e:
print(f"Error processing {futures[future]}: {str(e)}")
pbar.update(1)
except Exception as e:
print(f"Error with thread pool: {str(e)}")
# If thread pool fails, try sequential processing
print("Falling back to sequential processing...")
results = []
for file_path in tqdm(chunk, desc="Processing files sequentially"):
result = process_xml_file(file_path)
results.append(result)
if result['success']:
successful_files += 1
processed_files += len(chunk)
# Collect data from successful results
all_players = set()
all_games = []
all_game_players = []
all_cards = []
all_actions = []
for result in results:
if result.get('success', False):
all_players.update(result.get('players', []))
all_games.extend(result.get('games', []))
all_game_players.extend(result.get('game_players', []))
all_cards.extend(result.get('cards', []))
all_actions.extend(result.get('actions', []))
# Get player IDs
player_id_cache = get_player_id_cache(conn, all_players)
# Insert games
games_to_insert = []
for game in all_games:
if game['game_id'] not in existing_games:
games_to_insert.append((
game['game_id'], game['session_id'], game['start_date'],
game['small_blind'], game['big_blind'], game['ante'],
game['table_name'], game['player_count'], game['is_tournament']
))
existing_games.add(game['game_id'])
if games_to_insert:
print(f"Inserting {len(games_to_insert)} games...")
cursor.executemany('''
INSERT OR IGNORE INTO games
(game_id, session_id, start_date, small_blind, big_blind, ante, table_name, player_count, is_tournament)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
''', games_to_insert)
conn.commit()
total_games += len(games_to_insert)
# Insert game_players
game_players_to_insert = []
for gp in all_game_players:
player_id = player_id_cache.get(gp['player_name'])
if player_id:
game_players_to_insert.append((
gp['game_id'], player_id, gp['position'], gp['position_numeric'],
gp['initial_stack'], gp['is_hero'], gp['is_dealer'],
gp['final_stack'], gp['total_won'], gp['total_bet']
))
if game_players_to_insert:
print(f"Inserting {len(game_players_to_insert)} game players...")
cursor.executemany('''
INSERT OR IGNORE INTO game_players
(game_id, player_id, position, position_numeric, initial_stack,
is_hero, is_dealer, final_stack, total_won, total_bet)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', game_players_to_insert)
conn.commit()
# Insert cards
cards_to_insert = []
for card in all_cards:
player_id = player_id_cache.get(card['player_name']) if card['player_name'] else None
cards_to_insert.append((
card['game_id'], card['card_type'], player_id, card['card_values']
))
if cards_to_insert:
print(f"Inserting {len(cards_to_insert)} cards...")
cursor.executemany('''
INSERT INTO cards
(game_id, card_type, player_id, card_values)
VALUES (?, ?, ?, ?)
''', cards_to_insert)
conn.commit()
# Insert actions in smaller batches to avoid excessive memory usage
action_batch_size = 5000
total_batch_actions = len(all_actions)
if total_batch_actions > 0:
print(f"Inserting {total_batch_actions} actions in smaller batches...")
with tqdm(total=total_batch_actions, desc="Inserting actions") as pbar:
for j in range(0, total_batch_actions, action_batch_size):
actions_batch = all_actions[j:j+action_batch_size]
actions_to_insert = []
for action in actions_batch:
player_id = player_id_cache.get(action['player_name'])
if player_id:
actions_to_insert.append((
action['game_id'], player_id, action['action_round'], action['action_type'],
action['simple_action_type'], action['action_sum'], action['action_order'],
action['pot_before_action'], action['players_remaining']
))
if actions_to_insert:
cursor.executemany('''
INSERT INTO actions
(game_id, player_id, action_round, action_type, simple_action_type,
action_sum, action_order, pot_before_action, players_remaining)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
''', actions_to_insert)
conn.commit()
total_actions += len(actions_to_insert)
pbar.update(len(actions_batch))
chunk_time = time.time() - chunk_start
print(f"Chunk complete in {chunk_time:.2f}s. Progress: {processed_files}/{num_files} files, {successful_files} successful, {total_games} games, {total_actions} actions")
# Add indexes after all data is inserted
print("Adding indexes...")
cursor.execute('CREATE INDEX IF NOT EXISTS idx_game_id ON actions(game_id)')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_player_id ON actions(player_id)')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_action_type ON actions(simple_action_type)')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_game_players ON game_players(game_id, player_id)')
conn.commit()
# Optimize database
print("Optimizing database...")
cursor.execute('PRAGMA optimize')
conn.commit()
conn.close()
end_time = time.time()
print(f"Total processing time: {end_time - start_time:.2f} seconds")
print(f"Processed {processed_files} files, {successful_files} successful, inserted {total_games} games and {total_actions} actions")
def main():
"""Main function to demonstrate the workflow"""
# Configuration
db_path = "poker_analysis_optimized.db"
xml_folder = "ipoker_hh" # Folder containing XML files
# Process files with optimized code
process_directory(xml_folder, db_path, limit=None) # Set a limit or None for all files
print("Processing complete!")
if __name__ == "__main__":
main()