Content Moderation Glossary
Essential terminology for content moderation APIs, safety systems, compliance, and online platform management. From API concepts to legal requirements.
Allowlist / Whitelist
ModerationA list of approved words, phrases, domains, or users that are exempt from moderation rules. Overrides blocklist entries.
API (Application Programming Interface)
TechnicalA set of protocols and tools that allows different software applications to communicate. SafeComms provides a REST API for content moderation.
API Key
TechnicalA unique authentication token used to identify and authorize requests to the SafeComms API. Keep your API keys secret to prevent unauthorized access.
Appeal
EnforcementProcess allowing users to contest moderation decisions. Required by DSA (EU) and best practice under GDPR.
Automated Moderation
ModerationUse of algorithms and machine learning to detect and filter harmful content without human intervention. Faster and more scalable than manual review.
Bad Actor
SafetyA user who intentionally posts harmful, abusive, or rule-violating content. May use evasion techniques to bypass filters.
Ban
EnforcementPermanent or temporary suspension of a user account, preventing them from accessing the platform. Typically applied for severe or repeated violations.
Blocklist / Blacklist
ModerationA list of prohibited words, phrases, domains, or users. Content matching blocklist entries is automatically rejected or flagged.
Brigading
SafetyCoordinated mass harassment, review bombing, or reporting campaigns. Often organized off-platform.
Confidence Score
TechnicalA numerical value (0-1 or 0-100) indicating how certain the moderation system is that content violates a rule. Higher scores = higher confidence.
Content Warning
SafetyNotice displayed before showing potentially disturbing content (violence, adult themes). Allows user choice.
COPPA (Children's Online Privacy Protection Act)
LegalUS law requiring parental consent before collecting data from children under 13. Affects platforms that allow child users.
CSAM (Child Sexual Abuse Material)
SafetyIllegal content depicting minors in sexual situations. Must be reported to authorities (NCMEC in US). Zero tolerance for storage or distribution.
Doxing / Doxxing
SafetyPublishing private personal information about someone without consent, often with malicious intent (addresses, phone numbers, family details).
DPA (Data Processing Agreement)
LegalA legal contract between data controller and processor defining data handling responsibilities. Required under GDPR when using third-party services.
Endpoint
TechnicalA specific URL in an API that performs a function. Example: `/v1/moderate/text` is the SafeComms text moderation endpoint.
Evasion
SafetyTechniques used to bypass content filters, such as character substitution (@ for a), zero-width spaces, or deliberate misspellings.
False Negative
TechnicalWhen harmful content is incorrectly classified as safe. More dangerous than false positives but harder to detect.
False Positive
TechnicalWhen legitimate content is incorrectly flagged as violating rules. Reducing false positives improves user experience.
GDPR (General Data Protection Regulation)
LegalEU privacy law governing data collection, processing, and user rights. Applies to any platform serving EU users, regardless of location.
Hate Speech
SafetyContent that attacks or demeans individuals or groups based on protected characteristics (race, religion, gender, sexual orientation, etc.).
Hybrid Moderation
ModerationCombination of automated and manual review. AI handles high-confidence cases; humans review edge cases and appeals.
IP Ban
EnforcementBlocking access from a specific IP address. Less effective than account bans due to VPNs, but useful for bot traffic.
Latency
TechnicalThe time delay between sending a request and receiving a response. SafeComms averages 120ms latency for text moderation.
Leetspeak / 1337speak
SafetyAlternative spelling using numbers and symbols (e.g., "h3ll0" for "hello"). Often used for filter evasion.
Machine Learning (ML)
TechnicalAI systems that learn patterns from data. Used in content moderation to detect toxicity, spam, and harmful content with high accuracy.
Manual Moderation
ModerationHuman review of content. Slower and more expensive than automated moderation, but better at context and nuance.
Moderation Profile
ModerationA configuration defining which rules to apply, sensitivity thresholds, and actions to take. Allows different moderation for different content types.
Natural Language Processing (NLP)
TechnicalAI technology that understands and analyzes human language. Powers contextual moderation beyond simple keyword matching.
NCMEC (National Center for Missing & Exploited Children)
LegalUS organization that receives CSAM reports via CyberTipline. Platforms must report suspected CSAM within 24 hours.
NSFW (Not Safe For Work)
SafetyContent inappropriate for workplace viewing, typically sexual or violent imagery. May be allowed with age gates or content warnings.
Payload
TechnicalThe data sent in an API request or webhook. For SafeComms, the payload includes the content to moderate and optional metadata.
Phishing
SafetyFraudulent content attempting to steal credentials or financial information, often disguised as legitimate links or services.
PII (Personally Identifiable Information)
SafetyData that can identify an individual: email addresses, phone numbers, social security numbers, addresses, credit card details.
Profanity Filter
ModerationSystem that detects and blocks offensive language. Can be word-based (blocklists) or ML-based (contextual understanding).
Rate Limiting
TechnicalRestricting the number of API requests or content submissions from a user/IP within a time window. Prevents abuse and spam.
REST API
TechnicalA web service architecture using HTTP methods (GET, POST, etc.). SafeComms uses REST for easy integration with any language.
Rule
ModerationA specific moderation policy, such as "block profanity" or "flag hate speech." Multiple rules can be combined in a profile.
SDK (Software Development Kit)
TechnicalA library that simplifies API integration for a specific programming language. SafeComms offers SDKs for Node.js, Python, PHP, and more.
Section 230
LegalUS law protecting platforms from liability for user-generated content. Allows "good faith" moderation without legal risk.
Sentiment Analysis
TechnicalDetermining the emotional tone of text (positive, negative, neutral). Used to detect hostile or aggressive language.
Shadowban
EnforcementHiding a user's content from others without notifying them. User thinks their posts are visible, but nobody else can see them.
Spam
SafetyUnwanted repetitive content, often commercial or phishing links. Includes comment spam, fake reviews, and bot-generated posts.
Threshold
TechnicalThe minimum confidence score required to flag content. Lower thresholds = more sensitive (more false positives); higher = less sensitive (more false negatives).
Timeout
TechnicalMaximum time to wait for an API response before giving up. SafeComms recommends 5-10 second timeouts for moderation requests.
Toxicity
SafetyRude, disrespectful, or hostile language likely to make users leave a conversation. Includes insults, profanity, and personal attacks.
Tuning
ModerationAdjusting moderation thresholds and rules to balance false positives and false negatives for your specific use case.
User-Generated Content (UGC)
GeneralAny content created by users rather than the platform: posts, comments, reviews, messages, images, videos.
Violence / Gore
SafetyContent depicting physical harm, injury, death, or graphic violence. Often prohibited or requires content warnings.
Webhook
TechnicalAn HTTP callback that sends real-time notifications to your server when events occur, such as when content is flagged.
Zero-Day Content
SafetyNewly emerging harmful content patterns that existing filters don't catch. Requires continuous model updates.
Ready to Build Safer Platforms?
Now that you understand the terminology, start implementing world-class content moderation with SafeComms.
Missing a Term?
If you'd like to see additional terms added to this glossary, please contact us at [email protected] or submit feedback through our support portal.