AI Content Moderation for WordPress Chat
Public chat surfaces — community lobbies, vendor support inboxes, course cohort chats — get spam, harassment, and inappropriate content the way any open input does. Manual moderation does not scale beyond a few hundred members. Better Messages adds AI-powered content moderation that automatically detects harmful messages and either flags them for admin review or holds them until approved. Two providers are supported: Better Messages Moderation AI (no API key required, included with the WebSocket license) and OpenAI Moderation API (free with your own OpenAI key).
Two providers, pick by use case
Better Messages Moderation AI
- No third-party API key required. Included at no extra cost with the WebSocket license.
- Powered by Better Messages Cloud — no data sent to OpenAI or any other third party.
- 23 content categories — including extended categories OpenAI's API does not cover.
- Custom moderation rules — define your own rules in plain text ("Block contact info", "Block off-platform redirects", etc.).
- Conversation context awareness — detects patterns spread across multiple messages (e.g. a phone number split into two parts to evade single-message checks).
- Text and image moderation.
OpenAI Moderation API
- Free to use with your own OpenAI API key.
- Fixed set of categories — OpenAI's standard moderation taxonomy.
- Configurable confidence threshold.
- Text and image moderation.
For most sites, Better Messages Moderation AI is the default — it has more categories, supports custom rules, and does not require an external API account. OpenAI is the right pick if you have an existing OpenAI workflow and prefer to consolidate.
What actions are available
Two actions per flagged message:
- Flag Only (recommended) — the message is delivered normally but marked for admin review. Best for UX since no AI is 100% accurate, and legitimate messages do not get blocked.
- Hold for Review — the message is held until an admin approves or rejects it. Stricter, but may delay legitimate messages — appropriate for high-stakes environments (regulated industries, minors-safe communities).
Moderators review flagged / held messages in the Better Messages → Administration screen.
Content categories
Both providers detect the base OpenAI categories:
- Hate / Hate-Threatening
- Harassment / Harassment-Threatening
- Sexual Content / Sexual Minors
- Violence / Violence-Graphic
- Self-Harm / Self-Harm Intent / Self-Harm Instructions
- Illicit / Illicit-Violent
Better Messages Moderation AI adds extended categories that OpenAI does not cover:
- Spam
- Scam / Phishing
- Minor Safety
- Contact Sharing (phone numbers, emails, social handles, off-platform redirects)
- Profanity
- Impersonation
- Doxxing
- Drugs / Alcohol
- Threats
- Commercial Promotion
Selecting a parent category covers its subcategories.
Custom rules (BM Moderation AI)
Beyond categories, Better Messages Moderation AI supports plain-text custom rules. One rule per line:
Block contact info (phones, emails, social handles) and off-platform moves
Block promotions, affiliate links, recruitment
Block discussion of competitor platforms
The AI applies each rule per message and per conversation context. Categories and custom rules work independently — use one, the other, or both.
Conversation context
The conversation-context setting controls how many previous messages the AI considers for the moderation decision (0–20, default 5–10). This is how the AI catches patterns split across messages — a phone number broken into two messages, or a leading sentence in one message that makes the next message harmful.
Bypass roles
Admins, moderators, instructors, and other trusted roles can bypass moderation. Configure in Settings → Moderation → Bypass Roles. Moderation does not run for messages from these roles.
How to enable
- WP Admin → Better Messages → Settings → Moderation.
- Select a Moderation Provider (Better Messages or OpenAI).
- Enable AI Moderation.
- Choose the Flagged Message Action (Flag Only or Hold for Review).
- Select the content categories to detect.
- (BM provider) Add custom rules in the Custom Rules textarea.
- Set the conversation-context size.
- Configure bypass roles.
- (OpenAI) Paste your OpenAI API key under Integrations → OpenAI. Requires PHP 8.1+.
Free vs WebSocket version
| Feature | Free version | WebSocket version |
|---|---|---|
| OpenAI Moderation API (with your OpenAI key) | yes | yes |
| Better Messages Moderation AI (no API key, BM Cloud) | — | yes |
| Custom moderation rules | — | yes |
| Conversation context awareness | — | yes |
| Extended categories (spam, contact sharing, doxxing, etc.) | — | yes |
| Text + image moderation | yes (OpenAI) | yes (both providers) |
Better Messages Moderation AI is the substantial WebSocket-version perk on the moderation side. The extended categories (especially Spam, Contact Sharing, Doxxing, Impersonation, Commercial Promotion) are exactly what community admins moderate most often — and OpenAI's standard moderation API does not flag those.
Data privacy
Message content is sent to the selected provider for analysis. Better Messages Cloud does not store any message data — content is analyzed in real time and immediately discarded. If using OpenAI, review OpenAI's data-usage policy. AI moderation does not run on end-to-end encrypted threads (the server cannot see the ciphertext).
Frequently asked questions
Will this catch every harmful message?
No — no AI moderation is 100% accurate. The recommended Flag Only mode delivers messages normally and asks an admin to review flagged ones, so false negatives still get a human-review pass and false positives do not block legitimate conversations.
Does it work in real time?
Yes — moderation runs on every message at send time. With Flag Only, the message is delivered immediately; the flag is applied in parallel. With Hold for Review, the send blocks until the moderation decision returns (typically under a second).
Does it support non-English content?
Yes — both providers handle the major Western, CJK, and Indic languages. Coverage of less-common languages depends on the underlying AI provider.
Will it flag legitimate technical messages (code, URLs, etc.)?
The AI providers are tuned to avoid false positives on common technical content. If you see false positives, the Flag Only mode lets you build a feel for the model's behavior before switching to Hold for Review.
Can I turn off image moderation independently?
Yes — the Moderate Images toggle is separate from text moderation. Disable it on sites where users only share documents and PDFs.
Does it work with the AI Chat Bots add-on?
The two are independent. AI Chat Bots are conversation participants; AI Content Moderation is a pre-delivery check on every message. A bot's responses can be moderation-checked too — useful for guarding against AI hallucinations of harmful content.
See also
- AI Content Moderation feature documentation — full reference
- AI Chat Bots for WordPress private messaging — the related AI add-on
- Pre-moderation documentation — manual + AI hybrid moderation flow
- GDPR-compliant WordPress messaging — privacy footprint of the moderation flow