Skip to main content

AI Content Moderation for WordPress Chat

· 6 min read
Creator of Better Messages

Public chat surfaces — community lobbies, vendor support inboxes, course cohort chats — get spam, harassment, and inappropriate content the way any open input does. Manual moderation does not scale beyond a few hundred members. Better Messages adds AI-powered content moderation that automatically detects harmful messages and either flags them for admin review or holds them until approved. Two providers are supported: Better Messages Moderation AI (no API key required, included with the WebSocket license) and OpenAI Moderation API (free with your own OpenAI key).

Two providers, pick by use case

Better Messages Moderation AI

  • No third-party API key required. Included at no extra cost with the WebSocket license.
  • Powered by Better Messages Cloud — no data sent to OpenAI or any other third party.
  • 23 content categories — including extended categories OpenAI's API does not cover.
  • Custom moderation rules — define your own rules in plain text ("Block contact info", "Block off-platform redirects", etc.).
  • Conversation context awareness — detects patterns spread across multiple messages (e.g. a phone number split into two parts to evade single-message checks).
  • Text and image moderation.

OpenAI Moderation API

  • Free to use with your own OpenAI API key.
  • Fixed set of categories — OpenAI's standard moderation taxonomy.
  • Configurable confidence threshold.
  • Text and image moderation.

For most sites, Better Messages Moderation AI is the default — it has more categories, supports custom rules, and does not require an external API account. OpenAI is the right pick if you have an existing OpenAI workflow and prefer to consolidate.

What actions are available

Two actions per flagged message:

  • Flag Only (recommended) — the message is delivered normally but marked for admin review. Best for UX since no AI is 100% accurate, and legitimate messages do not get blocked.
  • Hold for Review — the message is held until an admin approves or rejects it. Stricter, but may delay legitimate messages — appropriate for high-stakes environments (regulated industries, minors-safe communities).

Moderators review flagged / held messages in the Better Messages → Administration screen.

Content categories

Both providers detect the base OpenAI categories:

  • Hate / Hate-Threatening
  • Harassment / Harassment-Threatening
  • Sexual Content / Sexual Minors
  • Violence / Violence-Graphic
  • Self-Harm / Self-Harm Intent / Self-Harm Instructions
  • Illicit / Illicit-Violent

Better Messages Moderation AI adds extended categories that OpenAI does not cover:

  • Spam
  • Scam / Phishing
  • Minor Safety
  • Contact Sharing (phone numbers, emails, social handles, off-platform redirects)
  • Profanity
  • Impersonation
  • Doxxing
  • Drugs / Alcohol
  • Threats
  • Commercial Promotion

Selecting a parent category covers its subcategories.

Custom rules (BM Moderation AI)

Beyond categories, Better Messages Moderation AI supports plain-text custom rules. One rule per line:

Block contact info (phones, emails, social handles) and off-platform moves
Block promotions, affiliate links, recruitment
Block discussion of competitor platforms

The AI applies each rule per message and per conversation context. Categories and custom rules work independently — use one, the other, or both.

Conversation context

The conversation-context setting controls how many previous messages the AI considers for the moderation decision (0–20, default 5–10). This is how the AI catches patterns split across messages — a phone number broken into two messages, or a leading sentence in one message that makes the next message harmful.

Bypass roles

Admins, moderators, instructors, and other trusted roles can bypass moderation. Configure in Settings → Moderation → Bypass Roles. Moderation does not run for messages from these roles.

How to enable

  1. WP Admin → Better Messages → Settings → Moderation.
  2. Select a Moderation Provider (Better Messages or OpenAI).
  3. Enable AI Moderation.
  4. Choose the Flagged Message Action (Flag Only or Hold for Review).
  5. Select the content categories to detect.
  6. (BM provider) Add custom rules in the Custom Rules textarea.
  7. Set the conversation-context size.
  8. Configure bypass roles.
  9. (OpenAI) Paste your OpenAI API key under Integrations → OpenAI. Requires PHP 8.1+.

Free vs WebSocket version

FeatureFree versionWebSocket version
OpenAI Moderation API (with your OpenAI key)yesyes
Better Messages Moderation AI (no API key, BM Cloud)yes
Custom moderation rulesyes
Conversation context awarenessyes
Extended categories (spam, contact sharing, doxxing, etc.)yes
Text + image moderationyes (OpenAI)yes (both providers)
info

Better Messages Moderation AI is the substantial WebSocket-version perk on the moderation side. The extended categories (especially Spam, Contact Sharing, Doxxing, Impersonation, Commercial Promotion) are exactly what community admins moderate most often — and OpenAI's standard moderation API does not flag those.

Data privacy

Message content is sent to the selected provider for analysis. Better Messages Cloud does not store any message data — content is analyzed in real time and immediately discarded. If using OpenAI, review OpenAI's data-usage policy. AI moderation does not run on end-to-end encrypted threads (the server cannot see the ciphertext).

Frequently asked questions

Will this catch every harmful message?

No — no AI moderation is 100% accurate. The recommended Flag Only mode delivers messages normally and asks an admin to review flagged ones, so false negatives still get a human-review pass and false positives do not block legitimate conversations.

Does it work in real time?

Yes — moderation runs on every message at send time. With Flag Only, the message is delivered immediately; the flag is applied in parallel. With Hold for Review, the send blocks until the moderation decision returns (typically under a second).

Does it support non-English content?

Yes — both providers handle the major Western, CJK, and Indic languages. Coverage of less-common languages depends on the underlying AI provider.

Will it flag legitimate technical messages (code, URLs, etc.)?

The AI providers are tuned to avoid false positives on common technical content. If you see false positives, the Flag Only mode lets you build a feel for the model's behavior before switching to Hold for Review.

Can I turn off image moderation independently?

Yes — the Moderate Images toggle is separate from text moderation. Disable it on sites where users only share documents and PDFs.

Does it work with the AI Chat Bots add-on?

The two are independent. AI Chat Bots are conversation participants; AI Content Moderation is a pre-delivery check on every message. A bot's responses can be moderation-checked too — useful for guarding against AI hallucinations of harmful content.

See also

Install Better Messages from WordPress.org →