使用 OpenAI Whisper API 進行語音轉文字，方便字幕後製或內容整理

更新於 2025/02/12發佈於 2024/02/13閱讀時間約 6 分鐘

前提

註冊 OpenAI 的 API ，並取得 SECRET KEY，然後填到程式裡面的 openai.api_key 裡。

基本的 Python 知識以及 Debug 能力。

程式碼， Python實現

import openai
from pydub import AudioSegment
import os
import codecs
import tempfile

# Set your OpenAI API key here
openai.api_key = 'your_openai_api_key'

def transcribe_audio_with_whisper(audio_file_path):
    """
    Transcribe an audio file using OpenAI's Whisper API.

    Args:
    - audio_file_path: Path to the audio file to transcribe.

    Returns:
    - The transcribed text as a string.
    """
    with open(audio_file_path, "rb") as audio_file:
        response = openai.Audio.transcribe('whisper-1', audio_file)
        return response['data']['text']

def split_and_transcribe_audio(file_path, segment_length_seconds=30):
    try:
        song = AudioSegment.from_file(file_path)
    except Exception as e:
        raise Exception(f"Error loading audio file: {e}")

    segment_length_ms = segment_length_seconds * 1000  # Correct calculation of milliseconds
    transcripts = []

    with tempfile.TemporaryDirectory() as temp_dir:
        for i, segment in enumerate([song[i:i+segment_length_ms] for i in range(0, len(song), segment_length_ms)]):
            segment_file_path = os.path.join(temp_dir, f"segment_{i}.mp3")
            segment.export(segment_file_path, format="mp3")
            
            transcript = transcribe_audio_with_whisper(segment_file_path)
            time_in_seconds = i * segment_length_seconds
            timestamp = f"[{time_in_seconds // 60:02d}:{time_in_seconds % 60:02d}]"
            transcripts.append(timestamp + " " + transcript)

    output_file_name = os.path.splitext(os.path.basename(file_path))[0] + '.txt'
    with codecs.open(output_file_name, 'w', encoding='utf-8') as f:  # Using UTF-8 encoding
        f.write("\n".join(transcripts))

# Example usage
split_and_transcribe_audio("test.mp3")

解釋

設置OpenAI API SECRET：需要在程式中設定你的OpenAI API鑰匙，以便使用Whisper API。
transcribe_audio_with_whisper 函數：
- 功能：使用 OpenAI 的 Whisper API 轉寫給定的音訊檔案。
- 參數：接受一個參數 audio_file_path，即需要轉寫的音訊檔案路徑。
- 返回值：返回轉寫後的文字。
- 實現方式：通過讀取音訊檔案並使用 openai.Audio.transcribe 方法來獲得轉寫結果。
split_and_transcribe_audio 函數：
- 功能：將長音訊檔案分割成較小的片段（預設為30秒長），然後使用Whisper API轉寫每個片段。
- 參數：file_path：長音訊檔案的路徑。segment_length_seconds：每個音訊片段的時長（秒），默認為30秒。
- 過程：使用 AudioSegment.from_file 加載音訊檔案。根據指定的片段長度（毫秒）將音訊分割成多個片段。為每個片段創建一個臨時文件，然後將其導出為MP3格式。對每個片段使用 transcribe_audio_with_whisper 函數進行轉寫。將轉寫結果和對應的時間戳添加到轉寫列表中。
- 輸出：將所有轉寫結果連同時間戳寫入到一個以原音訊檔案名命名的純文字文件中（換成 .txt）。