Whisper_Cloudflare is an open-source project initiated by developer thun888 and hosted on GitHub. It is based on OpenAI’s Whisper model and leverages the serverless architecture of Cloudflare Workers to provide efficient speech-to-text services. Users only need to deploy a single worker.js file to convert audio into text or generate timestamped SRT subtitle files. This project supports multiple languages and common audio formats, is easy to operate, and is suitable for developers to quickly build speech processing tools. The project is completely free, open-source, requires no server maintenance, and is suitable for personal or team audio transcription and subtitle generation needs.

Features List
- Speech-to-text: Supports multi-language audio transcription.
- Subtitle generation: Outputs SRT subtitle files with timestamp information.
- Multi-format compatibility: Supports common audio formats such as MP3, WAV.
- Serverless deployment: Deploy via Cloudflare Workers with a single
worker.jsfile. - API endpoints: Provides
/raw(raw transcription JSON) and/srt(subtitle file) interfaces. - Voice Activity Detection (VAD): Supports
vad_filterparameter to filter out non-speech content. - Context optimization: Improves recognition accuracy with
initial_promptandprefixparameters. - Translation feature: Supports translating audio content into specified languages.
Usage Guide
Deployment Process
Deploying Whisper_Cloudflare is very simple; just copy the worker.js code to the Cloudflare Workers platform without needing to clone the entire repository. The steps are as follows:
-
Register a Cloudflare account
Visit the official Cloudflare website, sign up or log in, and ensure Workers functionality is enabled (free plan supported). Go to the “Workers” page and click “Create a Worker”.
-
Create Worker and paste code
- Create a new Worker in the editor (default name is
workeror customize as needed). - Paste the provided
worker.jscode, replacing the default content. - Save the code.
- Create a new Worker in the editor (default name is
-
Install Wrangler (optional, command line deployment)
To manage Workers via command line, install Wrangler (Cloudflare Workers CLI). Node.js must be installed (recommended v16.17.0 or above), then execute:
npm install -g wrangler
-
Configure Wrangler and AI binding
- Log in to Cloudflare:
wrangler login
- Edit
wrangler.tomlfile to add configuration:
name = "whisper-cloudflare"
compatibility_flags = ["nodejs_compat"]
[ai]
binding = "AI"
- If not using Wrangler, manually bind the AI model on the Cloudflare dashboard and select
@cf/openai/whisper-large-v3-turbo.
-
Deploy Worker
- Click the “Deploy” button in the editor to publish the code.
- Or use the command line:
wrangler deploy
- After successful deployment, Cloudflare will generate a Worker URL, e.g.,
https://whispercloudflare.tchepai.com/.
-
Prepare audio files
Ensure audio format is MP3 or WAV, and file size under 25MB (Cloudflare limitation). Audio can be uploaded as binary files or provided via accessible URLs (cloud storage recommended).
Main Function Operations
Speech-to-Text
Whisper_Cloudflare uses the Whisper model to convert audio to text. Steps:
- Upload audio: Send a POST request with binary audio data to the
/rawendpoint. For example:
curl -X POST "https://whisper.ohen5pbf93.workers.dev/raw" \
-H "Content-Type: application/octet-stream" \
--data-binary "@audio.mp3"
- Obtain transcription result: The interface returns JSON with text and segment time information:
{
"response": {
"text": "This is a test audio.",
"segments": [
{"text": "This is a", "start": 0.0, "end": 1.2},
{"text": "test audio", "start": 1.3, "end": 2.5}
]
}
}
- Handling large files: For files over 25MB, manually split into approximately 1MB chunks, upload separately, then merge results.
Subtitle Generation
Generate SRT subtitle files suitable for videos. Process:
- Request subtitles: Upload audio to the
/srtendpoint:
curl -X POST "https://whispercloudflare.tchepai.com/srt" \
-H "Content-Type: application/octet-stream" \
--data-binary "@audio.mp3"
- Receive subtitle results: Returns subtitles in SRT format, e.g.:
1
00:00:00,000 --> 00:00:01,200
This is a
2
00:00:01,300 --> 00:00:02,500
test audio
Web Interface Usage
worker.js includes a built-in HTML interface allowing users to operate via browser:
- Access the Worker URL root path (e.g., https://whispercloudflare.tchepai.com/).
- Select MP3 or WAV files and set task type (transcription or translation), language, VAD filter, etc.
- Submit and display SRT subtitles with support for
.srtfile download. - Includes progress bar; processing 41-minute audio takes approximately 1.9 minutes.
API Usage
The project provides two key API endpoints:
/raw: Returns raw transcription data in JSON format for developers to further process./srt: Outputs SRT subtitle files suitable for video editing.
Example JavaScript usage:
const response = await fetch('https://whispercloudflare.tchepai.com/srt', {
method: 'POST',
headers: { 'Content-Type': 'application/octet-stream' },
body: audioFile // binary audio data
});
const srt = await response.text();
console.log(srt); // outputs SRT subtitles
Context Optimization
Use initial_prompt or prefix parameters to input context information and improve transcription accuracy. Example:
curl -X POST "https://whispercloudflare.tchepai.com/raw?initial_prompt=technical meeting" \
-H "Content-Type: application/octet-stream" \
--data-binary "@audio.mp3"
Voice Activity Detection (VAD)
Enable VAD filtering to remove non-speech content by setting vad_filter=true:
curl -X POST "https://whispercloudflare.tchepai.com/raw?vad_filter=true" \
-H "Content-Type: application/octet-stream" \
--data-binary "@audio.mp3"
Translation Feature
Set task=translate and language parameters to translate audio content into the specified language. Example:
curl -X POST "https://whispercloudflare.tchepai.com/raw?task=translate&language=en" \
-H "Content-Type: application/octet-stream" \
--data-binary "@audio.mp3"
Performance and Limitations
- Speed: Processing a 41-minute 39-second audio takes about 1.9 minutes in tests.
- Limitations: Cloudflare Workers resource limits may cause occasional failures; multiple attempts are recommended.
- File size: Single upload size limit is 25MB.
Notes
- API Security: Correctly configure AI binding in the Cloudflare management platform to avoid token leakage.
- Error handling: If the request fails, wait a moment and retry.
- Browser compatibility: The web interface supports mainstream modern browsers such as Chrome, Firefox, etc.
Application Scenarios
- Meeting transcription: Upload meeting audio to generate text or SRT subtitles, supporting multilingual meeting organization.
- Podcast subtitle generation: Create SRT subtitles for podcast content to improve accessibility and search optimization.
- Educational resource transcription: Teachers or students upload classroom recordings to generate notes or subtitles for easy review.
- Voice application development: Developers use the API to build real-time subtitles or voice assistants, suitable for lightweight projects.
Frequently Asked Questions (FAQ)
-
Which audio formats are supported?
Supports MP3, WAV, and other formats. High-quality audio is recommended.
-
How to handle large files?
Must manually split into approximately 1MB chunks and upload separately before merging.
-
Is deployment charged?
Cloudflare Workers free plan supports deployment. The AI model provides 10,000 free neurons daily; additional usage is charged $0.011 per 1000 neurons.
-
How to improve transcription accuracy?
Optimize recognition using
initial_prompt,prefix, andvad_filterparameters. -
Which languages are supported?
Including English, Chinese, Japanese, and many others. Refer to the official Whisper documentation for details.