AI Voice Message Transcription for WordPress Chat
Voice messages are the right tool for chat moments when typing is slow or the relationship is personal — but they have one universal problem: the recipient cannot always listen. They are in a meeting, on a noisy train, or simply prefer text. Better Messages adds AI voice-message transcription to bridge the gap: any participant in a thread can click the transcribe button on a voice message and a text version appears below it, in any of 99+ languages, with results cached so the next viewer sees the text instantly. Two providers — Better Messages Cloud AI (included with the WebSocket license, no third-party key) or OpenAI (free with your own OpenAI key).
How transcription works inside the messenger
- A user records and sends a voice message (via the Voice Messages add-on).
- Recipients see the voice message in the thread with a transcribe button.
- Any participant clicks the button — the audio is sent to the configured provider.
- The transcribed text appears below the voice message, visible to all participants.
- Results are cached on the server — once transcribed, every future viewer sees the text instantly.
The on-demand model means transcription costs only happen when someone actually needs the text, not on every voice message sent. Members who prefer audio do not trigger transcription; members who need text get it per-message.
Two providers, pick by use case
Better Messages Cloud AI
- Included with the WebSocket license. No third-party API key, no OpenAI account.
- Runs on private Better Messages servers — no data shared with any third party.
- 99+ languages with automatic source-language detection.
- Real-time response when possible; async callback for longer audio.
OpenAI Whisper
- Free to use with your own OpenAI API key.
- Same language coverage (OpenAI Whisper supports 99+ languages).
- Configured under Integrations → OpenAI.
For most sites, Better Messages Cloud AI is the default — no API key to manage, no separate billing, no third-party data flow.
Language auto-detection vs explicit setting
By default, the AI auto-detects the language from the audio. For sites where every user speaks the same language, set a specific language code (en, es, ru, ja, etc.) in the settings — this is faster and more accurate than auto-detection for known-language sites.
For multilingual communities, leave auto-detection on. The transcription picks up Spanish, Mandarin, Arabic, and 90+ other languages seamlessly.
How to enable
- Install Better Messages from WordPress.org with the WebSocket license.
- Install the Voice Messages add-on.
- Open WP Admin → Better Messages → Settings → Voice Messages.
- For Cloud AI: select Better Messages Transcription AI as the provider.
- For OpenAI: configure your OpenAI key under Integrations → OpenAI, then select OpenAI as the provider.
- Enable Transcription.
- Optionally set a Language code (or leave empty for auto-detection).
- Save.
Free vs WebSocket version
Voice messages themselves work on both versions (the Voice Messages add-on is a separate plugin compatible with both AJAX and WebSocket Better Messages). Transcription requires the WebSocket version — the Cloud AI provider depends on the WebSocket cloud, and the OpenAI provider also requires PHP 8.1+ and the WebSocket relay for callback delivery.
| Feature | Free version | WebSocket version |
|---|---|---|
| Voice message recording / playback | yes | yes |
| Voice message transcription (Better Messages Cloud AI) | — | yes |
| Voice message transcription (OpenAI Whisper) | — | yes |
| Auto-detect language vs explicit language setting | — | yes |
| Cached transcriptions | — | yes |
| 99+ language support | — | yes |
Transcription costs only happen when someone clicks the transcribe button — not on every voice message sent. Cached results make subsequent views instant. For sites with heavy voice-message usage and budget concerns on the OpenAI provider, this on-demand model keeps costs proportional to actual need.
When transcription matters most
| Use case | Why transcription helps |
|---|---|
| Coaching businesses | Coach sends a 2-minute voice note; client wants to revisit specific advice quickly via text search |
| LMS instructor chats | Student gets a voice answer; can read the transcript later when reviewing |
| Multilingual community | Voice in one language, recipient toggles transcription and uses AI Message Translation to read in their language |
| Accessibility | Members with hearing impairment get the text version |
| Marketplaces / vendor chat | Vendor sends a voice product description; buyer transcribes for sharing or copying details |
| Quote / agreement audit | Voice agreement transcribed for written record |
Callback reliability
Cloud AI transcription uses a callback URL to deliver results from Better Messages Cloud to your site. The plugin includes a Test Callback URL button in the Voice Messages settings to verify the cloud can reach your endpoint. If your firewall / WAF blocks external requests, whitelist:
https://yoursite.com/wp-json/better-messages/v1/ai/task-result
Without a reachable callback, transcription falls back to a cron-based retry that still delivers results, just slower.
Data privacy
Voice message audio is sent to the configured provider over HTTPS:
- Better Messages Cloud AI — runs on private Better Messages servers. No data shared with any third party. Audio is processed in real time and immediately discarded after transcription.
- OpenAI — subject to OpenAI's data-usage policy. Audio is sent to OpenAI's API for transcription; consult OpenAI's terms for retention specifics.
For sites with strict data-residency requirements, consider upgrading to the self-hosted plan so transcription processing stays inside your infrastructure.
Frequently asked questions
Are transcriptions accurate?
Modern AI speech recognition handles clean speech in well-supported languages at ~95% accuracy. Background noise, accents, technical jargon, and overlapping speakers reduce accuracy. For mission-critical transcription (legal, medical), treat AI output as a starting draft, not a final transcript.
Can transcription be enabled per-thread or only site-wide?
The provider configuration is site-wide. Whether the transcribe button appears on a voice message is controlled by whether transcription is enabled — there is no per-thread toggle in the default UI. A custom filter can hide the button on specific threads if needed.
Does it work in end-to-end encrypted threads?
No — E2E threads are decrypted only in the participants' browsers. Server-side AI cannot access the encrypted audio. This is the standard trade-off for E2E threads (also applies to AI translation and moderation). See End-to-end encrypted messaging on WordPress.
Can we charge users for transcription?
Yes — combine with GamiPress pay-to-message or MyCred pay-to-message. The user spends N points per transcription request. Useful for sites where transcription cost is meaningful.
What audio formats are supported?
WebM (the Voice Messages add-on's primary codec) and MP3 (the legacy codec) are both supported. Other audio formats uploaded as file attachments are not auto-transcribed — only voice messages from the add-on get the transcribe button.
Will the transcription appear in search results?
Yes — once transcribed and cached, the text is searchable through the standard message search (WordPress chat search). Voice messages without a transcription are not searchable by content.
See also
- Voice messages in WordPress chat — the underlying Voice Messages add-on
- AI message translation — combine with transcription for multilingual communities
- AI content moderation — another Better Messages Cloud AI feature
- AI chat bots for WordPress private messaging — for AI participant bots
- WordPress chat search — transcribed text becomes searchable
- End-to-end encrypted messaging — the trade-off with E2E threads