【語音合成技術 - Kokoro TTS】比想像中更強！用 Python + uv 打造支援繁中的語音 API

阿Han

發佈於阿Han的軟體技術棧 💡

2025/12/31 更新2025/12/31 發佈閱讀 13 分鐘

🚀 前言：一場意外的發現之旅

我一直以為語音生成（TTS）技術的門檻很高，不是要靠 Google Cloud、Azure Cognitive Service，就是要跑大量 GPU 模型，部署又複雜、成本又高，很難真正「自己掌握」。

直到某天，我在 GitHub 上看到 Kokoro TTS 一個開源、小體積、速度快、語音自然，而且竟然還支援繁體中文的模型！

我抱著「應該很難跑起來吧」的心態實測，沒想到：

👉 不到 10 秒完成安裝

👉 第一次執行會自動下載模型

👉 20 行 Python 就能生成音檔

👉 支援中文。

於是，我直接把它包成 API，做成一個可以網頁輸入文字 → 返回語音 → 直接播放的小專案。

這篇文章就帶你一步步實作它！

🧐 為什麼選擇 Kokoro TTS？我的實測心得

🛠️ 實作藍圖：我們要打造什麼？

我們目標是建立一個「前後端分離」的極簡語音服務，為了讓大家更清楚資料是怎麼流動的，我畫了一張簡單的流程圖：

💻 動手實作：Python + uv 的現代化開發體驗

這次我們不使用傳統的 pip 和 venv，改用最近 Python 社群爆紅的工具 uv。相信我，用過它極速的安裝體驗後，你就回不去了。

✨ 步驟一：閃電般的虛擬環境建置 (使用 uv)

打開你的終端機 (Terminal)，跟著我輸入幾個指令，環境瞬間就架好了：

# 1. 初始化一個新專案 ✨
uv init my-kokoro-api
cd my-kokoro-api

# 2. 建立虛擬環境 (.venv) 📦
uv venv

# 3. 啟動虛擬環境 (Windows) -> 記得看到命令列前面出現 (.venv)
.venv\Scripts\activate
# 3. 啟動虛擬環境 (Mac/Linux)
# source .venv/bin/activate

🔹 步驟二： pyproject.toml 定義專案環境

[project]
name = "my-kokoro-api"
version = "0.1.0"
description = "Kokoro TTS API with FastAPI and PyTorch"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"fastapi>=0.115.0",
"uvicorn[standard]>=0.32.0",
"torch>=2.5.0",
"numpy>=2.0.0",
"kokoro>=0.2.0",
"misaki[zh]>=0.1.0",
]

安裝相依套件：

uv sync

🔹 步驟三：打造後端核心 (main.py)

修改專案裡面的 main.py 檔案，貼上以下程式碼。我已經加上了詳細的註解，讓你看看它有多簡單：

import io
import wave
import numpy as np
from fastapi import FastAPI
from fastapi.responses import StreamingResponse, HTMLResponse
from pydantic import BaseModel
from kokoro import KPipeline

app = FastAPI()

# 全域變數存放模型
pipeline = None

def get_pipeline():
"""延遲載入模型"""
global pipeline
if pipeline is None:
pipeline = KPipeline(lang_code='z')  # 'z' = Mandarin Chinese
return pipeline

class TTSRequest(BaseModel):
text: str

@app.post("/api/tts")
async def text_to_speech(request: TTSRequest):
# 1. 呼叫 Kokoro 模型生成音訊
tts_pipeline = get_pipeline()
results = list(tts_pipeline(request.text, voice='zf_xiaoxiao'))

# 取得音訊數據 (第3個元素是音訊tensor)
_, _, audio_tensor = results[0]
samples = audio_tensor.cpu().numpy()
sample_rate = 24000  # Kokoro 的採樣率

# 2. 轉換為 16-bit PCM 格式
audio_data = (samples * 32767).astype(np.int16)

# 3. 建立 WAV 檔案
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(sample_rate)
wav_file.writeframes(audio_data.tobytes())

wav_buffer.seek(0)

# 4. 回傳 WAV 音訊流
return StreamingResponse(wav_buffer, media_type="audio/wav")

@app.get("/", response_class=HTMLResponse)
async def homepage():
with open("index.html", encoding="utf-8") as f:
return HTMLResponse(f.read())

if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=9000)

啟動服務器：

uv run uvicorn main:app --reload --port 9000

瀏覽器開啟： 👉 http://localhost:9000

🔹 步驟四：簡易前端測試 (index.html)

最後，我們寫一個超級陽春的 HTML 頁面來測試成果。在同一個資料夾下建立 index.html：

<!DOCTYPE html>
<html lang="zh-TW">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Kokoro TTS 範例</title>
<style>
body {
font-family: Arial, sans-serif;
max-width: 600px;
margin: 50px auto;
padding: 20px;
}
h1 {
color: #333;
}
textarea {
width: 100%;
height: 100px;
padding: 10px;
font-size: 16px;
margin: 10px 0;
}
button {
background-color: #4CAF50;
color: white;
padding: 15px 32px;
font-size: 16px;
border: none;
cursor: pointer;
border-radius: 4px;
}
button:hover {
background-color: #45a049;
}
button:disabled {
background-color: #cccccc;
cursor: not-allowed;
}
#status {
margin-top: 20px;
padding: 10px;
border-radius: 4px;
}
.success {
background-color: #d4edda;
color: #155724;
}
.error {
background-color: #f8d7da;
color: #721c24;
}
</style>
</head>
<body>
<h1>🔊 Kokoro TTS 範例</h1>
<textarea id="textInput" placeholder="請輸入要轉換為語音的文字...">你好，這是使用 Kokoro 的中文語音合成範例。</textarea>
<br>
<button id="generateBtn" onclick="generateSpeech()">生成語音</button>
<div id="status"></div>

<script>
async function generateSpeech() {
const text = document.getElementById('textInput').value;
const button = document.getElementById('generateBtn');
const status = document.getElementById('status');

if (!text.trim()) {
status.className = 'error';
status.textContent = '請輸入文字！';
return;
}

button.disabled = true;
status.className = '';
status.textContent = '正在生成語音...';

try {
// 1. 發送文字請求到後端
const response = await fetch('/api/tts', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ text: text })
});

if (!response.ok) {
throw new Error('生成失敗');
}

// 2. 接收音訊數據
const blob = await response.blob();
const audioUrl = URL.createObjectURL(blob);

// 3. 播放聲音
const audio = new Audio(audioUrl);
audio.play();

status.className = 'success';
status.textContent = '✓ 語音生成成功！正在播放...';

} catch (error) {
status.className = 'error';
status.textContent = '✗ 錯誤：' + error.message;
} finally {
button.disabled = false;
}
}
</script>
</body>
</html>