這兩天一直在跟Roboflow dataset與YOLO奮鬥,原本想說直接用現成的dataset來訓練模型就好,結果發現只要介面稍有不同,辨識準確度就大幅下降。這讓我不斷在新舊技術選擇間徘徊 - 是該用熟悉的template matching方式,還是堅持鑽研這些機器學習的新技術?
使用template matching的方式我很清楚,因為過去在OpenHoldem就是這麼做的。雖然每次平台改版都要重新抓取template很麻煩,但至少我知道怎麼做。反觀現在要學習的YOLO和電腦視覺,從labeling到training都是全新的領域,連基本概念都得重新建立,這種從零開始的感覺實在讓人焦慮。
經過一番思考,我決定採取雙軌並行的策略 - 先用熟悉的template matching方式做出一個基礎版本,確保Project能持續推進;同時慢慢摸索機器學習相關的技術。雖然這意味著我得在有限的時間裡分配心力在兩個方向,但這可能是最務實的做法。
開始處理撲克牌識別時,在數字方面一下就過了,相較於使用OpenHoldem限制一堆的matching template模式,是Python做起來真是輕鬆多了,先將圖片與template做一些處理,就能不受背景顏色影響。結果,遇到了一個看似簡單卻出乎意料棘手的問題。原本想透過顏色識別來判斷撲克牌的花色,感覺是再直觀不過的方案 - 畢竟分辨顏色有什麼難的?然而實作過程卻讓我徹底改變了想法。
花了好幾個小時在顏色識別上打轉,有趣的是,AI給出的建議反而越來越複雜,為了解決一個本該簡單的顏色辨識問題,竟然需要用上這麼多進階技術?這個過程讓我不禁開始質疑自己的方向。
最後決定放下執著,回到最基本的形狀識別方式。雖然撲克牌上的花色形狀並不明顯,但這個「樸素」的解決方案竟然出乎意料地有效。這次的經驗再次提醒我,在解決問題時,不應該被預設的解決方案所侷限。有時候堅持走某條路,可能會讓我們投入過多資源在一個其實有更簡單解法的問題上。
這種在理想與現實間找尋平衡的過程,某種程度上也反映了我在技術學習路上的成長。不是所有問題都需要最前沿的解決方案,找到適合當前情境的解法,可能比追求完美更重要。
import cv2
import numpy as np
from ppadb.client import Client as AdbClient
from dataclasses import dataclass
from typing import List, Tuple, Dict
import os
import time
@dataclass
class Card:
rank: str
suit: str
confidence: float
class PokerCardDetector:
def __init__(self):
# Initialize templates
self.rank_templates = {}
self.suit_templates = {}
self.template_path = 'card_templates'
self.load_templates()
self.hero_card_regions = [
{'x1': 464, 'y1': 1289, 'x2': 541, 'y2': 1400}, # First hero card
{'x1': 540, 'y1': 1291, 'x2': 616, 'y2': 1398} # Second hero card
]
self.community_card_regions = [
{'x1': 299, 'y1': 870, 'x2': 390, 'y2': 1022}, # Flop 1
{'x1': 399, 'y1': 871, 'x2': 485, 'y2': 1019}, # Flop 2
{'x1': 496, 'y1': 873, 'x2': 586, 'y2': 1015}, # Flop 3
{'x1': 592, 'y1': 871, 'x2': 682, 'y2': 1023}, # Turn
{'x1': 688, 'y1': 870, 'x2': 780, 'y2': 1019} # River
]
# Initialize ADB
self.adb = AdbClient(host="", port=)
self.device = self.connect_to_device()
def connect_to_device(self):
devices = self.adb.devices()
if not devices:
raise Exception("No devices found. Make sure your emulator is running.")
return devices[0]
def load_templates(self):
"""Load all template images from the template directory"""
# Load rank templates
rank_path = os.path.join(self.template_path, 'ranks')
for filename in os.listdir(rank_path):
if filename.endswith('.png'):
rank = filename.split('.')[0] # Get rank from filename
template = cv2.imread(os.path.join(rank_path, filename))
if template is not None:
self.rank_templates[rank] = template
# Load suit templates
suit_path = os.path.join(self.template_path, 'suits')
for filename in os.listdir(suit_path):
if filename.endswith('.png'):
suit = filename.split('.')[0] # Get suit from filename
template = cv2.imread(os.path.join(suit_path, filename))
if template is not None:
self.suit_templates[suit] = template
def preprocess_image(self, image: np.ndarray) -> np.ndarray:
"""Preprocess image for template matching"""
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply adaptive thresholding
binary = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2
)
# Clean up noise
kernel = np.ones((3,3), np.uint8)
binary = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)
return binary
def match_template(self, image: np.ndarray, template: np.ndarray) -> Tuple[float, Tuple[int, int]]:
"""Perform template matching and return best match"""
# Preprocess both images
processed_image = self.preprocess_image(image)
processed_template = self.preprocess_image(template)
# Perform template matching
result = cv2.matchTemplate(processed_image, processed_template, cv2.TM_CCOEFF_NORMED)
_, max_val, _, max_loc = cv2.minMaxLoc(result)
return max_val, max_loc
def match_template_suit(self, image: np.ndarray, template: np.ndarray) -> Tuple[float, Tuple[int, int]]:
"""Perform template matching and return best match"""
# Preprocess both images
#processed_image = self.preprocess_image(image)
#processed_template = self.preprocess_image(template)
# Perform template matching
result = cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
_, max_val, _, max_loc = cv2.minMaxLoc(result)
return max_val, max_loc
def detect_card(self, roi: np.ndarray) -> Card:
"""Detect rank and suit in a card region"""
best_rank = None
best_rank_conf = 0
best_suit = None
best_suit_conf = 0
# Match rank
for rank, template in self.rank_templates.items():
conf, _ = self.match_template(roi, template)
if conf > best_rank_conf:
best_rank_conf = conf
best_rank = rank
# Match suit
for suit, template in self.suit_templates.items():
conf, _ = self.match_template_suit(roi, template)
#conf = self.match_template(roi, template)
if conf > best_suit_conf:
best_suit_conf = conf
best_suit = suit
if best_rank_conf > 0.6 and best_suit_conf > 0.9:
return Card(best_rank, best_suit, min(best_rank_conf, best_suit_conf))
return None
def capture_screen(self) -> np.ndarray:
"""Capture screenshot from device"""
screenshot_data = self.device.screencap()
nparr = np.frombuffer(screenshot_data, np.uint8)
return cv2.imdecode(nparr, cv2.IMREAD_COLOR)
def find_coordinates(self):
"""Helper function to find card coordinates"""
# Capture screen
screen = self.capture_screen()
# Save the screenshot
cv2.imwrite("poker_screenshot.png", screen)
# Create a window to display the image
window_name = 'Card Coordinate Finder'
cv2.namedWindow(window_name)
def mouse_callback(event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
print(f"Clicked coordinates: x={x}, y={y}")
cv2.setMouseCallback(window_name, mouse_callback)
while True:
# Display the image with a grid
display_img = screen.copy()
height, width = screen.shape[:2]
# Draw grid lines every 50 pixels
for x in range(0, width, 50):
cv2.line(display_img, (x, 0), (x, height), (0, 255, 0), 1)
# Add coordinate labels
cv2.putText(display_img, str(x), (x, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
for y in range(0, height, 50):
cv2.line(display_img, (0, y), (width, y), (0, 255, 0), 1)
# Add coordinate labels
cv2.putText(display_img, str(y), (5, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
cv2.imshow(window_name, display_img)
# Press 'q' to quit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
def find_coordinates_scaling(self):
"""Helper function to find card coordinates with resizable window"""
# Capture screen
screen = self.capture_screen()
# Save the original screenshot
cv2.imwrite("poker_screenshot.png", screen)
# Create a resizable window
window_name = 'Card Coordinate Finder (Press "q" to quit)'
cv2.namedWindow(window_name, cv2.WINDOW_NORMAL)
# Set initial window size to 800x600 or another comfortable size
cv2.resizeWindow(window_name, 800, 600)
# Keep track of the scale factor
original_height, original_width = screen.shape[:2]
def mouse_callback(event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
# Get current window size
window_width = cv2.getWindowImageRect(window_name)[2]
window_height = cv2.getWindowImageRect(window_name)[3]
# Calculate scale factors
scale_x = original_width / window_width
scale_y = original_height / window_height
# Convert clicked coordinates back to original image coordinates
original_x = int(x * scale_x)
original_y = int(y * scale_y)
print(f"Clicked coordinates in original image: x={original_x}, y={original_y}")
cv2.setMouseCallback(window_name, mouse_callback)
while True:
# Get current window size
window_rect = cv2.getWindowImageRect(window_name)
if window_rect is not None:
window_width = window_rect[2]
window_height = window_rect[3]
# Create display image with grid
display_img = screen.copy()
# Draw grid lines every 50 pixels
for x in range(0, original_width, 50):
cv2.line(display_img, (x, 0), (x, original_height), (0, 255, 0), 1)
cv2.putText(display_img, str(x), (x, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
for y in range(0, original_height, 50):
cv2.line(display_img, (0, y), (0, original_height), (0, 255, 0), 1)
cv2.putText(display_img, str(y), (5, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
# Resize display image to fit window
display_img_resized = cv2.resize(display_img, (window_width, window_height))
cv2.imshow(window_name, display_img_resized)
# Press 'q' to quit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
def run_detection(self):
"""Main detection loop"""
while True:
# Capture screen
screen = self.capture_screen()
# Detect hero cards
hero_cards = []
for region in self.hero_card_regions:
roi = screen[region['y1']:region['y2'], region['x1']:region['x2']]
card = self.detect_card(roi)
if card:
hero_cards.append(card)
# Detect community cards
community_cards = []
for region in self.community_card_regions:
roi = screen[region['y1']:region['y2'], region['x1']:region['x2']]
card = self.detect_card(roi)
if card:
community_cards.append(card)
# Print results
print("Hero cards:", [f"{c.rank}{c.suit}" for c in hero_cards])
print("Community cards:", [f"{c.rank}{c.suit}" for c in community_cards])
time.sleep(3) # Add delay to prevent excessive CPU usage
def main():
detector = PokerCardDetector()
#detector.find_coordinates_scaling()
detector.run_detection()
if __name__ == "__main__":
main()