重啟撲克機器人之路 -2 ：新舊技術間的掙扎

更新於 2025/02/03發佈於 2025/01/22閱讀時間約 24 分鐘

這兩天一直在跟Roboflow dataset與YOLO奮鬥,原本想說直接用現成的dataset來訓練模型就好,結果發現只要介面稍有不同,辨識準確度就大幅下降。這讓我不斷在新舊技術選擇間徘徊 - 是該用熟悉的template matching方式,還是堅持鑽研這些機器學習的新技術?

使用template matching的方式我很清楚,因為過去在OpenHoldem就是這麼做的。雖然每次平台改版都要重新抓取template很麻煩,但至少我知道怎麼做。反觀現在要學習的YOLO和電腦視覺,從labeling到training都是全新的領域,連基本概念都得重新建立,這種從零開始的感覺實在讓人焦慮。

經過一番思考,我決定採取雙軌並行的策略 - 先用熟悉的template matching方式做出一個基礎版本,確保Project能持續推進；同時慢慢摸索機器學習相關的技術。雖然這意味著我得在有限的時間裡分配心力在兩個方向,但這可能是最務實的做法。

開始處理撲克牌識別時，在數字方面一下就過了，相較於使用OpenHoldem限制一堆的matching template模式，是Python做起來真是輕鬆多了，先將圖片與template做一些處理，就能不受背景顏色影響。結果，遇到了一個看似簡單卻出乎意料棘手的問題。原本想透過顏色識別來判斷撲克牌的花色，感覺是再直觀不過的方案 - 畢竟分辨顏色有什麼難的？然而實作過程卻讓我徹底改變了想法。

花了好幾個小時在顏色識別上打轉，有趣的是，AI給出的建議反而越來越複雜，為了解決一個本該簡單的顏色辨識問題，竟然需要用上這麼多進階技術？這個過程讓我不禁開始質疑自己的方向。

最後決定放下執著，回到最基本的形狀識別方式。雖然撲克牌上的花色形狀並不明顯，但這個「樸素」的解決方案竟然出乎意料地有效。這次的經驗再次提醒我，在解決問題時，不應該被預設的解決方案所侷限。有時候堅持走某條路，可能會讓我們投入過多資源在一個其實有更簡單解法的問題上。

這種在理想與現實間找尋平衡的過程，某種程度上也反映了我在技術學習路上的成長。不是所有問題都需要最前沿的解決方案，找到適合當前情境的解法，可能比追求完美更重要。

import cv2

import numpy as np

from ppadb.client import Client as AdbClient

from dataclasses import dataclass

from typing import List, Tuple, Dict

import os

import time



@dataclass

class Card:

rank: str

suit: str

confidence: float



class PokerCardDetector:

def __init__(self):

# Initialize templates

self.rank_templates = {}

self.suit_templates = {}

self.template_path = 'card_templates'

self.load_templates()



self.hero_card_regions = [

{'x1': 464, 'y1': 1289, 'x2': 541, 'y2': 1400}, # First hero card

{'x1': 540, 'y1': 1291, 'x2': 616, 'y2': 1398} # Second hero card

]

self.community_card_regions = [

{'x1': 299, 'y1': 870, 'x2': 390, 'y2': 1022}, # Flop 1

{'x1': 399, 'y1': 871, 'x2': 485, 'y2': 1019}, # Flop 2

{'x1': 496, 'y1': 873, 'x2': 586, 'y2': 1015}, # Flop 3

{'x1': 592, 'y1': 871, 'x2': 682, 'y2': 1023}, # Turn

{'x1': 688, 'y1': 870, 'x2': 780, 'y2': 1019} # River

]



# Initialize ADB

self.adb = AdbClient(host="", port=)

self.device = self.connect_to_device()



def connect_to_device(self):

devices = self.adb.devices()

if not devices:

raise Exception("No devices found. Make sure your emulator is running.")

return devices[0]



def load_templates(self):

"""Load all template images from the template directory"""

# Load rank templates

rank_path = os.path.join(self.template_path, 'ranks')

for filename in os.listdir(rank_path):

if filename.endswith('.png'):

rank = filename.split('.')[0] # Get rank from filename

template = cv2.imread(os.path.join(rank_path, filename))

if template is not None:

self.rank_templates[rank] = template



# Load suit templates

suit_path = os.path.join(self.template_path, 'suits')

for filename in os.listdir(suit_path):

if filename.endswith('.png'):

suit = filename.split('.')[0] # Get suit from filename

template = cv2.imread(os.path.join(suit_path, filename))

if template is not None:

self.suit_templates[suit] = template



def preprocess_image(self, image: np.ndarray) -> np.ndarray:

"""Preprocess image for template matching"""

# Convert to grayscale

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply adaptive thresholding

binary = cv2.adaptiveThreshold(

gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

cv2.THRESH_BINARY_INV, 11, 2

)

# Clean up noise

kernel = np.ones((3,3), np.uint8)

binary = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)

return binary



def match_template(self, image: np.ndarray, template: np.ndarray) -> Tuple[float, Tuple[int, int]]:

"""Perform template matching and return best match"""

# Preprocess both images

processed_image = self.preprocess_image(image)

processed_template = self.preprocess_image(template)

# Perform template matching

result = cv2.matchTemplate(processed_image, processed_template, cv2.TM_CCOEFF_NORMED)

_, max_val, _, max_loc = cv2.minMaxLoc(result)

return max_val, max_loc

def match_template_suit(self, image: np.ndarray, template: np.ndarray) -> Tuple[float, Tuple[int, int]]:

"""Perform template matching and return best match"""

# Preprocess both images

#processed_image = self.preprocess_image(image)

#processed_template = self.preprocess_image(template)

# Perform template matching

result = cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)

_, max_val, _, max_loc = cv2.minMaxLoc(result)

return max_val, max_loc



def detect_card(self, roi: np.ndarray) -> Card:

"""Detect rank and suit in a card region"""

best_rank = None

best_rank_conf = 0

best_suit = None

best_suit_conf = 0



# Match rank

for rank, template in self.rank_templates.items():

conf, _ = self.match_template(roi, template)

if conf > best_rank_conf:

best_rank_conf = conf

best_rank = rank



# Match suit

for suit, template in self.suit_templates.items():

conf, _ = self.match_template_suit(roi, template)

#conf = self.match_template(roi, template)

if conf > best_suit_conf:

best_suit_conf = conf

best_suit = suit



if best_rank_conf > 0.6 and best_suit_conf > 0.9:

return Card(best_rank, best_suit, min(best_rank_conf, best_suit_conf))

return None



def capture_screen(self) -> np.ndarray:

"""Capture screenshot from device"""

screenshot_data = self.device.screencap()

nparr = np.frombuffer(screenshot_data, np.uint8)

return cv2.imdecode(nparr, cv2.IMREAD_COLOR)

def find_coordinates(self):

"""Helper function to find card coordinates"""

# Capture screen

screen = self.capture_screen()

# Save the screenshot

cv2.imwrite("poker_screenshot.png", screen)

# Create a window to display the image

window_name = 'Card Coordinate Finder'

cv2.namedWindow(window_name)

def mouse_callback(event, x, y, flags, param):

if event == cv2.EVENT_LBUTTONDOWN:

print(f"Clicked coordinates: x={x}, y={y}")

cv2.setMouseCallback(window_name, mouse_callback)

while True:

# Display the image with a grid

display_img = screen.copy()

height, width = screen.shape[:2]

# Draw grid lines every 50 pixels

for x in range(0, width, 50):

cv2.line(display_img, (x, 0), (x, height), (0, 255, 0), 1)

# Add coordinate labels

cv2.putText(display_img, str(x), (x, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

for y in range(0, height, 50):

cv2.line(display_img, (0, y), (width, y), (0, 255, 0), 1)

# Add coordinate labels

cv2.putText(display_img, str(y), (5, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

cv2.imshow(window_name, display_img)

# Press 'q' to quit

if cv2.waitKey(1) & 0xFF == ord('q'):

break

cv2.destroyAllWindows()



def find_coordinates_scaling(self):

"""Helper function to find card coordinates with resizable window"""

# Capture screen

screen = self.capture_screen()

# Save the original screenshot

cv2.imwrite("poker_screenshot.png", screen)

# Create a resizable window

window_name = 'Card Coordinate Finder (Press "q" to quit)'

cv2.namedWindow(window_name, cv2.WINDOW_NORMAL)

# Set initial window size to 800x600 or another comfortable size

cv2.resizeWindow(window_name, 800, 600)

# Keep track of the scale factor

original_height, original_width = screen.shape[:2]

def mouse_callback(event, x, y, flags, param):

if event == cv2.EVENT_LBUTTONDOWN:

# Get current window size

window_width = cv2.getWindowImageRect(window_name)[2]

window_height = cv2.getWindowImageRect(window_name)[3]

# Calculate scale factors

scale_x = original_width / window_width

scale_y = original_height / window_height

# Convert clicked coordinates back to original image coordinates

original_x = int(x * scale_x)

original_y = int(y * scale_y)

print(f"Clicked coordinates in original image: x={original_x}, y={original_y}")

cv2.setMouseCallback(window_name, mouse_callback)

while True:

# Get current window size

window_rect = cv2.getWindowImageRect(window_name)

if window_rect is not None:

window_width = window_rect[2]

window_height = window_rect[3]

# Create display image with grid

display_img = screen.copy()

# Draw grid lines every 50 pixels

for x in range(0, original_width, 50):

cv2.line(display_img, (x, 0), (x, original_height), (0, 255, 0), 1)

cv2.putText(display_img, str(x), (x, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

for y in range(0, original_height, 50):

cv2.line(display_img, (0, y), (0, original_height), (0, 255, 0), 1)

cv2.putText(display_img, str(y), (5, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

# Resize display image to fit window

display_img_resized = cv2.resize(display_img, (window_width, window_height))

cv2.imshow(window_name, display_img_resized)

# Press 'q' to quit

if cv2.waitKey(1) & 0xFF == ord('q'):

break

cv2.destroyAllWindows()



def run_detection(self):

"""Main detection loop"""



while True:

# Capture screen

screen = self.capture_screen()

# Detect hero cards

hero_cards = []

for region in self.hero_card_regions:

roi = screen[region['y1']:region['y2'], region['x1']:region['x2']]

card = self.detect_card(roi)

if card:

hero_cards.append(card)



# Detect community cards

community_cards = []

for region in self.community_card_regions:

roi = screen[region['y1']:region['y2'], region['x1']:region['x2']]

card = self.detect_card(roi)

if card:

community_cards.append(card)



# Print results

print("Hero cards:", [f"{c.rank}{c.suit}" for c in hero_cards])

print("Community cards:", [f"{c.rank}{c.suit}" for c in community_cards])

time.sleep(3) # Add delay to prevent excessive CPU usage



def main():

detector = PokerCardDetector()



#detector.find_coordinates_scaling()

detector.run_detection()



if __name__ == "__main__":

main()