重啟撲克機器人之路 -3 ：從OCR到程式碼重構的一天

更新於 2025/02/03發佈於 2025/01/23閱讀時間約 8 分鐘

今天繼續在撲克桌況辨識上奮鬥，決定使用OCR來處理文字辨識的部分。相較於OpenHoldem時期需要一個個擷取數字Template進行比對的方式，OCR確實帶來了效率的提升，短短半小時就完成了基礎的辨識功能。

然而道路總是充滿意想不到的挑戰。在測試不同撲克軟體時，發現介面上的英文字會干擾數字辨識的準確度。原本想說把它們納入辨識區域應該不會有什麼問題，反正在辨識後再取用數字部分即可，結果卻造成了意外的困擾。尋求AI建議後，得到的解決方案反而越來越複雜，就為了處理這麼一個看似簡單的問題。最後我選擇了更務實的做法，直接略過英文字只擷取數字的Region。這個決定讓我再次體會到，在實作中，找到一個「夠用」的解決方案，有時比追求完美的解法更重要。

下午的時候，開始著手整理Project的程式碼結構。過去總是習慣把所有程式碼都寫在一個Jupyter Notebook裡，結構鬆散且難以維護。今年我決定提早開始規劃程式碼架構，希望能避免之前的慘痛教訓。在AI的指導下，開始把不同功能的程式碼分門別類地放入適當的資料夾中。

poker_detector/
├── src/
│   ├── __init__.py
│   ├── detector/
│   │   ├── __init__.py
│   │   ├── card_detector.py
│   │   ├── text_detector.py
│   │   └── template_matcher.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── card.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── image_preprocessing.py
│   │   └── device_connector.py
│   └── config/
│       ├── __init__.py
│       └── regions.py
├── main.py
└── requirements.txt

這個重構過程中遇到了Python模組匯入的老問題 - 從不同資料夾匯入類別或函式總是會出錯。有趣的是，這個問題我之前就遇過好幾次，每次解決後卻總是忘記解法，下次又得重新摸索。這次我決定把解決方案好好記錄下來：必須在每個資料夾建立__init__.py檔案，並在主程式中正確設定路徑。這個看似簡單的問題卻花了我半小時才解決，但至少這次的經驗提醒了我確實記錄的重要性。

import sys
import os

# Get the directory of the current file and append the project root to sys.path

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

#text_detector.py

import cv2

import pytesseract

from src.utils.image_preprocessing import ImagePreprocessor

import numpy as np



class TextDetector:

@staticmethod

def extract_number(text: str) -> float:

numbers = ''.join(c for c in text if c.isdigit() or c == '.')

try:

return float(numbers)

except ValueError:

return 0.0



def detect_text(self, roi: np.ndarray, is_dark_background: bool = False) -> str:

if is_dark_background:

processed = ImagePreprocessor.preprocess_for_ocr_dark_background(roi)

else:

processed = ImagePreprocessor.preprocess_for_ocr(roi)

return pytesseract.image_to_string(processed, config='--psm 7 digits')



def detect_value(self, roi: np.ndarray, is_dark_background: bool = False) -> float:

text = self.detect_text(roi, is_dark_background)

return self.extract_number(text)

#image_preprocessing.py
import cv2

import numpy as np



class ImagePreprocessor:

@staticmethod

def preprocess_for_template(image: np.ndarray) -> np.ndarray:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

binary = cv2.adaptiveThreshold(

gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

cv2.THRESH_BINARY_INV, 11, 2

)

kernel = np.ones((3,3), np.uint8)

binary = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)

return binary



@staticmethod

def preprocess_for_ocr(roi: np.ndarray) -> np.ndarray:

gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)

_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

scaled = cv2.resize(binary, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)

denoised = cv2.fastNlMeansDenoising(scaled)

return denoised