OpenAI、Claude、Perplexity 三大 AI 搜尋 (web search) API 串接指南

何家慈 Chia Tzu Ho

發佈於CT 數位工具與程式學習筆記

2025/09/18 更新2025/09/14 發佈閱讀 18 分鐘

更新時間：2025.09.14

隨著 AI 大型語言模型（LLM）能力不斷提升，越來越多開發者與內容工作者開始關注如何透過LLM API 整合網路搜尋（web search）能力，補足 LLM 針對新知識的不足。

過去 Perplexity 被認為是「AI 網路搜尋」的霸主。然而近期 OpenAI 與 Claude 也相繼支援相關 API。

本文希望能統整目前 OpenAI、Claude、Perplexity 等主要平台的 AI 搜尋 API 的串接功能！包含三家的規格、串接方式、重要參數、使用限制與程式範例！

以下範例也都將以 Python 為例。

「注意：本篇專注於討論三家LLM大廠的『API』支援與使用，並非討論三家『聊天介面』的使用！」

OpenAI Web Search

三種搜尋邏輯：

非推理式搜尋（使用非推理模型如 gpt-4o、gpt-4.1等，而非gpt-5）：速度快，適合快速回答。
代理搜尋（Agentic Search）：模型會主動管理搜尋流程（像 agent 一樣），可以將搜尋納入「思考鏈」（chain of thought），解析各步結果，決定要不要繼續搜尋。
深度研究（Deep Research）：專門給需要深入、長時間、跨多來源網頁資料的任務。模型會以 agent 方式反覆搜尋、整合大量（可能數百個）來源。

編按：關於「深度研究」，也可以參考 OpenAI 自行推出的 Deep Research 方案。

Responses API 寫法

編按：OpenAI 強力推薦 Responses API 寫法（Chat Completions 寫法較舊，猜測未來可能會被汰除）。OpenAI 在網路搜尋部分，兩個新舊 API 的運作分析可點此。

以下整理自官方網站。

網路搜尋會被視為一個「工具」，是 OpenAI 內建的工具（內建工具包含網路搜尋、檔案搜尋、程式運作工具（Code Interpreter）等），因此可以「tools=[{"type": "web_search"}],」定義。

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    tools=[{"type": "web_search"}],
    input="What was a positive news story from today?"
)

print(response.output_text)

舊版 OpenAI 推出的網路搜尋會使用「web_search_preview」指定。官方文件表示，「web_search_preview 這個版本功能較舊，不支援限制網域搜尋（domain filtering），也可能不支援某些參數。」

可使用模型包含：gpt-4o-mini、gpt-4o、gpt-4.1-mini、gpt-4.1、o4-mini、o3、gpt-5（推理：low/medium/high）皆可。
（不適用模型：gpt-5 minimal 和 gpt-4.1-nano）

使用網路搜尋回覆格式會附上「資料來源（citations）」，格式如下：

[
    {
        "type": "web_search_call",
        "id": "ws_67c9fa0502748190b7dd390736892e100be649c1a5ff9609",
        "status": "completed"
    },
    {
        "id": "msg_67c9fa077e288190af08fdffda2e34f20be649c1a5ff9609",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "output_text",
                "text": "On March 6, 2025, several news...",
                "annotations": [
                    {
                        "type": "url_citation",
                        "start_index": 2606,
                        "end_index": 2758,
                        "url": "https://...",
                        "title": "Title..."
                    }
                ]
            }
        ]
    }
]

參數定義 - Domain Filtering：可限制網路搜尋的範圍。如果正在製作專業化工具，使用此參數可以避免一些農場文干擾 AI 閱讀。

response = client.responses.create(
  model="gpt-5",
  reasoning={"effort": "low"},
  tools=[
    {
      "type": "web_search",
      "filters": {
        "allowed_domains": [
          "pubmed.ncbi.nlm.nih.gov",
          "clinicaltrials.gov",
          "www.who.int",
          "www.cdc.gov",
          "www.fda.gov"
        ]
      }
    }
  ],
  tool_choice="auto",
  include=["web_search_call.action.sources"],
  input="Please perform a web search on how semaglutide is used in the treatment of diabetes."
)
print(response.output_text)

參數定義 - location：限制搜尋地區（個人較少使用）

response = client.responses.create(
    model="o4-mini",
    tools=[{
        "type": "web_search",
        "user_location": {
            "type": "approximate",
            "country": "GB",
            "city": "London",
            "region": "London",
        }
    }],
    input="What are the best restaurants around Granary Square?",
)
print(response.output_text)

參數定義 - search_context_size：可調整網路檢索回來上下文內容的數量（資料量）。

response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "low",
    }],
    input="What movie won best picture in 2025?",
)

OpenAI Agents SDK 寫法

最新支援「WebSearchTool()」OpenAI 內建工具。官方不太詳細文件請參考此XD。

from agents import Agent, Runner, WebSearchTool

agent = Agent(
    name="Assistant",
    tools=[
        WebSearchTool()
    ],
)

async def main():
    result = await Runner.run(agent, "Which coffee shop should I go to, taking into account my preferences and the weather today in SF?")
    print(result.final_output)

參數定義 - Domain Filtering：可限制網路搜尋的範圍。如果正在製作專業化工具，使用此參數可以避免一些農場文干擾 AI 閱讀。

官方文件不太清楚，筆者自行測試補充

from openai.types.responses.web_search_tool import Filters

web_search_agent = Agent(
  name="Web Search Agent",
  instructions='You are a Taiwan professional news analyst. Your task is to do the web search.',
  tools=[
    WebSearchTool(filters=Filters(
      allowed_domains=["zh.wikipedia.org"]
    ))
  ],
  model="gpt-4.1",
)

詳細參數對應可參考此。

Claude Web Search

筆者撰寫此文的本週，Claude API 才全面支援網路搜尋。（已正式官宣，但網頁仍寫「beta測試」版XD）

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-1-20250805",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Please analyze the content at https://example.com/article"
        }
    ],
    tools=[{
        "type": "web_fetch_20250910",
        "name": "web_fetch",
        "max_uses": 5
    }],
    extra_headers={
        "anthropic-beta": "web-fetch-2025-09-10"
    }
)
print(response)

參數有以下可調：

- max_uses: 限制每次對話最多可抓取多少個網頁（無預設上限）。
- allowed_domains / blocked_domains: 白名單或黑名單限制允許抓取的網域，可防止敏感資訊洩漏或限制僅能存取特定站點。可使用 subpath（e.g. example.com/blog）。官方有提醒，請使用「example.com」而非「https://example.com」。此外，不能同時使用白名單跟黑名單，只能擇一使用。
- max_content_tokens: 限制每次抓取後最大納入對話的 token 數，超過將截斷內容，避免花費過多 token 成本。
- citations: 可控制是否在回應中啟用段落級引用，引用來源可追蹤到具體位置。

此外，官方很貼心的建議大家，用 max_content_tokens 這個參數，限制每次抓內容時最多納入多少 token，這樣就能依自己的用途和預算「控管最大消耗量」。

範例（說明常見情境下的 token 用量）：

一般網頁（10KB）大約佔用 2,500 token
大型說明文件網頁（100KB）大約 25,000 token
一份大型學術 PDF（500KB）約 125,000 token

Perplexity Web Search

Perplexity 本身就是以網路搜尋起家，個人覺得效果很好，且有很多參數靈活可調，但筆者認為對於繁體中文世界而言，輸出品質有待加強（較多中國用語、不小心輸出簡體字...努力下 prompt 解決XD）。

import requests

response = requests.post(
    'https://api.perplexity.ai/chat/completions',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'model': 'sonar-pro',
        'messages': [
            {
                'role': 'system',
                'content: "You are a Taiwan professional news analyst based on Taiwan. Your task is to do the web search. Use 繁體中文、台灣用語 to reply. DO NOT USE English or 簡體中文。絕對不能使用大陸用語"
            }
            {
                'role': 'user',
                'content': "What are the major AI developments and announcements from today across the tech industry?"
            }
        ],
        'search_domain_filter': ["zh.wikipedia.org"],
        'search_recency_filter': 'month'
    }
)

print(response.json())

參數有以下可調：

search_mode 設定搜尋模式（web（預設）/ academic）
reasoning_effort設定每個查詢投入的推理資源（low/medium/high）：僅 sonar-deep-research 模型下有效。
search_domain_filter 設定搜尋網域範圍（e.g. week/day）

// Allowlist: Only search these domains
"search_domain_filter": ["wikipedia.org", "github.com", "stackoverflow.com"]

// Denylist: Exclude these domains  
"search_domain_filter": ["-reddit.com", "-pinterest.com", "-quora.com"]

search_recency_filter 設定搜尋時間
return_images 設定是否需圖片回傳（False（預設）/ True）
return_related_questions 可回傳後續問題（False（預設）/ True）

其餘參數與過去 OpenAI Completions API 雷同，也都有調整溫度、是否 Streaming、格式化輸出（Strucutred Output）等參數，詳細可至 API 網站查看。

CTCT 數位工具與程式學習筆記軟體工具CTCT 數位工具與程式學習筆記AI 筆記

留言

留言分享你的想法！

87會員

91內容數

加入沙龍追蹤 CT 更多文章！

CT的其他內容

2025/09/08

OpenAI Agents SDK 04 - 使用 Sessions 輕鬆管理 AI 記憶

OpenAI Agents SDK 更新 Session 機制，大幅簡化 AI 記憶管理，讓開發支援對話型 AI、Agent 更輕鬆！本文介紹 Session 原理、三種記憶方法（OpenAI Conversations API、SQLite、SQLAlchemy），設計 AI Agent 更輕鬆！