使用N8N和crawl4ai構建自動化電商數據分析平臺

更新於 2025/07/28發佈於 2025/07/28閱讀時間約 9 分鐘

📌 TL;DR：這是偏向於技術實戰導向的文章。我將聚焦於如何以 N8N 搭配 crawl4ai 建構出一套模組化、自動化的電商資料分析平台，並分享我為什麼選擇這套組合。內容將包含N8N中每個節點的配置、流程邏輯、錯誤處理、資料清洗、排程執行與跨平台串接等。

✅ 為什麼選擇 N8N + crawl4ai？

N8N:
視覺化流程設計、自動排程、易除錯、串接 DB / API
crawl4ai:
基於 CSS selector、YAML 設定、穩定輕巧、快速擴展多網站

個人因素

「自動化」更符合爬蟲本質 (我認為最重要的)
完全開源且免費的工具
可擴充性與優秀的效能表現

🛠️ N8N 自動化流程設計解析

流程總覽

1️⃣請求任務 crawl post

(1)  不同部署環境下，URL 的設定方式也會不同
Docker Compose(專案default) http://crawl4ai:11235/crawl
N8N、crawl4ai 跑本機 http://localhost:11235/crawl
N8N、crawl4ai 跑本機docker容器 http://host.docker.internal:11235/crawl
雲端主機 https://your-domain.com/crawl

(2) credential
Bearer CRAWL4AI_API_TOKEN
CRAWL4AI_API_TOKEN 於 docker-compose設定，default:0000

(3) 爬蟲 URL
開啟欲爬取網頁devtools以重新設定該網頁css
**示範網頁:https://www.amazon.com/-/zh_TW/gp/bestsellers/electronics/ref=pd_zg_ts_electronics** 
**可透過sitemap獲得大量欲爬取網頁**

(4) URL設定對應 crawl post
Docker Compose(專案default) http://crawl4ai:11235/task/taskID
N8N、crawl4ai 跑本機 http://localhost:11235/task/taskID
N8N、crawl4ai 跑本機docker容器 http://host.docker.internal:11235/task/taskID
雲端主機 https://your-domain.com/task/taskID

2️⃣資料清洗

3️⃣存進資料庫

(1) HTTP Request節點:請求查看products表是否有重複ID
URL:https://<your-project-id>.supabase.co/rest/v1/products?product_code=eq.productID

(2) HTTP Request credentials:
apikey : your-anon-key
Authorization : Bearer your-anon-key

(3) supabase credentials
Host : NEXT_PUBLIC_SUPABASE_URL(https://your_account.supabase.co)
Service Role Secret : Bearer service_role API keys

0️⃣ crawl4ai 請求格式(css strategy)

{
"urls": ["https://www.amazon.com/-/zh_TW/gp/bestsellers/electronics/ref=pd_zg_ts_electronics"],
"crawler_params": {
"headless": true,
"wait_before_extract": 3000},

"extraction_config": {
"type": "json_css",
"params": {
"schema": {
"name": "character",
"baseSelector": "div.p13n-desktop-grid",
"fields": [
{
"name": "Name",
"selector": "._cDEzb_p13n-sc-css-line-clamp-3_g3dy1",
"type": "list",
"fields":[{"name": "Name","type": "text"}]
},
{
"name": "AsinList",
"selector": "._cDEzb_iveVideoWrapper_JJ34T",
"type": "list",
"fields":[{"name": "asin",
"type": "attribute",
"attribute": "data-asin"}]
},
{
"name": "Rank",
"selector": "span.zg-bdg-text",
"type": "list",
"fields":[{"name": "Name","type": "text"}]
},
{
"name": "Rate",
"selector": ".a-icon-row",
"type": "list",
"fields":[{"name": "Name","type": "text"}]
},
{
"name": "Price",
"selector": "span.p13n-sc-price, span._cDEzb_p13n-sc-price_3mJ9Z",
"type": "list",
"fields":[{"name": "Name","type": "text"}]
}
]
},
"verbose": true
}
},
"cache_mode": "bypass",
"semphore_count": 1,
"delay_between_requests": 3000
}