Detection Methodology

How AI Detection Tools Work (And Why They Often Get It Wrong)

AI detection tools are widely deployed in academic institutions, media organizations, and professional contexts — but their accuracy is frequently misrepresented. Understanding how they actually work reveals both why they catch real AI content and why they fail in predictable, documented ways. This knowledge is essential for anyone affected by AI detection decisions.

How they work

Perplexity, burstiness, classifiers, Unicode scanning

Failure modes

False positives, model drift, short text issues

What to trust

Calibrated expectations for detection results

The Core Technical Architecture

AI detection tools are probabilistic classification systems. They do not have any privileged access to AI model logs, user accounts, or generation records. They work entirely from the text itself, using statistical analysis to estimate the probability that the text was generated by an AI system rather than written by a human.

The fundamental insight behind all AI detection is that language models produce text with characteristic statistical properties — properties that emerge from how they work, not from any deliberate design choice. Specifically, models tend to select high-probability tokens (predictable word choices) and produce structurally uniform text (consistent sentence length and pattern). Human writing has more statistical entropy.

Signal 1: Perplexity Analysis

Perplexity is the primary detection signal. To compute it, the detector feeds the text through a language model and measures how surprised the model is at each token choice. Technically, it computes the exponential of the average negative log-likelihood of the token sequence.

Lower perplexity = the model found the text very predictable = AI-like. Higher perplexity = the model was frequently surprised by word choices = human-like. The score is averaged across the text and mapped to a detection probability.

Why perplexity is necessary but not sufficient

It captures the fundamental statistical difference between AI and human text
But: formal human writing also has low perplexity — creating false positives
But: high-temperature AI generation produces higher perplexity — evading detection
But: the reference model used affects scores — different models produce different perplexity for the same text

Signal 2: Burstiness Analysis

Burstiness measures the variance in sentence length and structural complexity. AI text clusters in a narrow range (15–25 words per sentence, consistent complexity). Human text shows wider variation — mixing short emphatic sentences with long explanatory ones.

Detectors compute burstiness as a statistical measure of the distribution of sentence lengths. Low variance = low burstiness = AI-like. High variance = high burstiness = human-like. Combined with perplexity, burstiness significantly improves classification accuracy.

Signal 3: Trained Classifier Models

More sophisticated detectors add a trained classifier layer on top of perplexity and burstiness. These classifiers are neural networks trained on large datasets of labeled text — text known to be human-written and text known to be AI-generated. They learn patterns beyond simple statistical measures:

What classifiers learn

Characteristic vocabulary patterns per model family
Argument structure and reasoning flow patterns
Transition phrase distributions
Topic introduction and conclusion styles
Hedging and qualifier usage patterns

Classifier limitations

Training data reflects specific AI models and time periods
New model versions may have different patterns not in training data
Classifiers are biased toward the distribution of their training set
They can be fooled by systematic substitution of vocabulary patterns

Signal 4: Unicode Character Scanning

Some detection systems include Unicode character analysis as a secondary signal. AI text tends to contain invisible characters (zero-width spaces, BOM, soft hyphens) at higher rates than human-typed text. The ChatGPT Watermark Detector and Invisible Character Detector focus specifically on this signal.

This is the most deterministic of the four signals — a zero-width space is either there or it is not. But it is also the most easily removed: cleaning tools can eliminate these characters without affecting visible content.

The Major Failure Modes

AI detection tools have well-documented failure modes. Understanding them is essential for placing any detection result in proper context.

False positive: non-native speakers

Non-native English speakers writing carefully produce low-perplexity, low-burstiness text — the same statistical profile as AI. Studies have shown false positive rates above 60% for some non-native speaker populations. This is the most serious equity issue in AI detection.

False positive: formal academic writing

Academic prose follows highly predictable conventions. STEM writing, legal analysis, and technical documentation all score as AI-like by perplexity measures because they follow consistent, high-probability patterns by design.

False negative: edited AI text

Any significant editing of AI text increases perplexity and burstiness, pushing the score toward human. AI text that has been substantially rewritten may score below detection thresholds even though AI was involved in its creation.

Model drift

Detectors trained on older AI models may not accurately detect newer ones. GPT-4 produces statistically different text from GPT-3.5. Models fine-tuned for specific domains produce different patterns still. Detectors require ongoing retraining to maintain accuracy.

Short text unreliability

Perplexity and burstiness calculations require statistical samples. Short texts (under 250 words) do not provide enough tokens for reliable scoring. Most detectors are essentially unreliable below this threshold, though they may still return confident-looking scores.

Domain specificity

Detectors calibrated on general text may perform differently on specialized domain content. Technical documentation, poetry, dialogue, and other special formats all have distinctive statistical properties that can interfere with standard detection.

Published Accuracy Claims and What They Mean

AI detection companies publish accuracy statistics for their products, but these statistics require careful interpretation. Here is how to read them critically.

"98% accuracy on our test set" means accuracy on the specific dataset the company used for testing — typically balanced samples of AI and human text in ideal conditions. Real-world performance is lower.
Published accuracy does not report false positive rates separately. A tool that flags everything as AI would have 100% true positive rate. The false positive rate is the critical number for anyone concerned about being wrongly accused.
Test sets are often not representative. If the test set does not include formal academic writing or non-native speaker text, the reported accuracy overstates real-world performance for those groups.
Accuracy degrades with model updates. A tool reporting 95% accuracy tested on GPT-3.5 output may perform significantly worse on GPT-4 or newer models.

What You Should Actually Trust

Given these limitations, here is a practical framework for how much weight to give AI detection results:

High confidence situations

Unedited AI output on a general topic, scored at 90%+ by multiple independent detectors, with invisible Unicode characters present. This combination is highly indicative of AI origin.

Low confidence situations

Any individual detector score for a single document. A 75% AI score from one tool on formal academic text could easily be a false positive. No single score should be treated as conclusive.

Cannot be determined

Whether the author "cheated" by using AI. Detection tools can score patterns; they cannot evaluate intent, the extent of AI use, or whether the intellectual work is genuine.

The AI Detector tool on this site is calibrated to be transparent about its confidence levels. Use it alongside the ChatGPT Watermark Detector and Invisible Character Detector for a multi-signal assessment that is more reliable than any single tool.

Use multiple signals, not a single score.

The AI Detector gives you the statistical pattern analysis. The ChatGPT Watermark Detector covers Unicode artifacts. The Invisible Character Detector provides the technical breakdown. Together they give you a picture that no single tool can provide alone.

GPTCLEANUP AI Blog