Skip to main content

AI Voice Message Transcription for WordPress Chat

· 7 min read
Creator of Better Messages

Voice messages are the right tool for chat moments when typing is slow or the relationship is personal — but they have one universal problem: the recipient cannot always listen. They are in a meeting, on a noisy train, or simply prefer text. Better Messages adds AI voice-message transcription to bridge the gap: any participant in a thread can click the transcribe button on a voice message and a text version appears below it, in any of 99+ languages, with results cached so the next viewer sees the text instantly. Two providers — Better Messages Cloud AI (included with the WebSocket license, no third-party key) or OpenAI (free with your own OpenAI key).

How transcription works inside the messenger

  1. A user records and sends a voice message (via the Voice Messages add-on).
  2. Recipients see the voice message in the thread with a transcribe button.
  3. Any participant clicks the button — the audio is sent to the configured provider.
  4. The transcribed text appears below the voice message, visible to all participants.
  5. Results are cached on the server — once transcribed, every future viewer sees the text instantly.

The on-demand model means transcription costs only happen when someone actually needs the text, not on every voice message sent. Members who prefer audio do not trigger transcription; members who need text get it per-message.

Two providers, pick by use case

Better Messages Cloud AI

  • Included with the WebSocket license. No third-party API key, no OpenAI account.
  • Runs on private Better Messages servers — no data shared with any third party.
  • 99+ languages with automatic source-language detection.
  • Real-time response when possible; async callback for longer audio.

OpenAI Whisper

  • Free to use with your own OpenAI API key.
  • Same language coverage (OpenAI Whisper supports 99+ languages).
  • Configured under Integrations → OpenAI.

For most sites, Better Messages Cloud AI is the default — no API key to manage, no separate billing, no third-party data flow.

Language auto-detection vs explicit setting

By default, the AI auto-detects the language from the audio. For sites where every user speaks the same language, set a specific language code (en, es, ru, ja, etc.) in the settings — this is faster and more accurate than auto-detection for known-language sites.

For multilingual communities, leave auto-detection on. The transcription picks up Spanish, Mandarin, Arabic, and 90+ other languages seamlessly.

How to enable

  1. Install Better Messages from WordPress.org with the WebSocket license.
  2. Install the Voice Messages add-on.
  3. Open WP Admin → Better Messages → Settings → Voice Messages.
  4. For Cloud AI: select Better Messages Transcription AI as the provider.
  5. For OpenAI: configure your OpenAI key under Integrations → OpenAI, then select OpenAI as the provider.
  6. Enable Transcription.
  7. Optionally set a Language code (or leave empty for auto-detection).
  8. Save.

Free vs WebSocket version

Voice messages themselves work on both versions (the Voice Messages add-on is a separate plugin compatible with both AJAX and WebSocket Better Messages). Transcription requires the WebSocket version — the Cloud AI provider depends on the WebSocket cloud, and the OpenAI provider also requires PHP 8.1+ and the WebSocket relay for callback delivery.

FeatureFree versionWebSocket version
Voice message recording / playbackyesyes
Voice message transcription (Better Messages Cloud AI)yes
Voice message transcription (OpenAI Whisper)yes
Auto-detect language vs explicit language settingyes
Cached transcriptionsyes
99+ language supportyes
info

Transcription costs only happen when someone clicks the transcribe button — not on every voice message sent. Cached results make subsequent views instant. For sites with heavy voice-message usage and budget concerns on the OpenAI provider, this on-demand model keeps costs proportional to actual need.

When transcription matters most

Use caseWhy transcription helps
Coaching businessesCoach sends a 2-minute voice note; client wants to revisit specific advice quickly via text search
LMS instructor chatsStudent gets a voice answer; can read the transcript later when reviewing
Multilingual communityVoice in one language, recipient toggles transcription and uses AI Message Translation to read in their language
AccessibilityMembers with hearing impairment get the text version
Marketplaces / vendor chatVendor sends a voice product description; buyer transcribes for sharing or copying details
Quote / agreement auditVoice agreement transcribed for written record

Callback reliability

Cloud AI transcription uses a callback URL to deliver results from Better Messages Cloud to your site. The plugin includes a Test Callback URL button in the Voice Messages settings to verify the cloud can reach your endpoint. If your firewall / WAF blocks external requests, whitelist:

https://yoursite.com/wp-json/better-messages/v1/ai/task-result

Without a reachable callback, transcription falls back to a cron-based retry that still delivers results, just slower.

Data privacy

Voice message audio is sent to the configured provider over HTTPS:

  • Better Messages Cloud AI — runs on private Better Messages servers. No data shared with any third party. Audio is processed in real time and immediately discarded after transcription.
  • OpenAI — subject to OpenAI's data-usage policy. Audio is sent to OpenAI's API for transcription; consult OpenAI's terms for retention specifics.

For sites with strict data-residency requirements, consider upgrading to the self-hosted plan so transcription processing stays inside your infrastructure.

Frequently asked questions

Are transcriptions accurate?

Modern AI speech recognition handles clean speech in well-supported languages at ~95% accuracy. Background noise, accents, technical jargon, and overlapping speakers reduce accuracy. For mission-critical transcription (legal, medical), treat AI output as a starting draft, not a final transcript.

Can transcription be enabled per-thread or only site-wide?

The provider configuration is site-wide. Whether the transcribe button appears on a voice message is controlled by whether transcription is enabled — there is no per-thread toggle in the default UI. A custom filter can hide the button on specific threads if needed.

Does it work in end-to-end encrypted threads?

No — E2E threads are decrypted only in the participants' browsers. Server-side AI cannot access the encrypted audio. This is the standard trade-off for E2E threads (also applies to AI translation and moderation). See End-to-end encrypted messaging on WordPress.

Can we charge users for transcription?

Yes — combine with GamiPress pay-to-message or MyCred pay-to-message. The user spends N points per transcription request. Useful for sites where transcription cost is meaningful.

What audio formats are supported?

WebM (the Voice Messages add-on's primary codec) and MP3 (the legacy codec) are both supported. Other audio formats uploaded as file attachments are not auto-transcribed — only voice messages from the add-on get the transcribe button.

Will the transcription appear in search results?

Yes — once transcribed and cached, the text is searchable through the standard message search (WordPress chat search). Voice messages without a transcription are not searchable by content.

See also

Install Better Messages from WordPress.org →