What Turnitin's AI detector actually does
Turnitin added AI detection to their plagiarism checker in April 2023, and expanded it through 2024 to cover more model families. GPT-4, Claude, Gemini, DeepSeek. The classifier operates at the sentence level: each sentence gets scored for AI-ness based on statistical features like perplexity, sentence length, vocabulary diversity, and phrase co-occurrence patterns. The document-level score is a weighted roll-up of the sentence scores.
Turnitin claims 98% accuracy internally. Independent tests put real-world accuracy between 60% and 75%, depending on input length and genre. A widely-cited 2024 study from the University of Colorado found roughly that range too. Short answers (under 300 words) and technical writing with heavy citation density are especially prone to false positives. The detector is probabilistic, not deterministic: human essays get flagged sometimes, and ChatGPT drafts slip through sometimes.
Your teacher sees one thing: the AI score, typically as a percentage. Most schools set a threshold somewhere between 20% and 30%, above which an essay gets flagged for manual review. Below the threshold, it passes silently.
Why ChatGPT drafts keep getting flagged
Large language models have statistical fingerprints. They average sentence length toward a comfortable middle (usually 19-22 words). They reach for the same transitional phrases, furthermore, moreover, in conclusion, it is important to note. They hedge where humans wouldn't: it could be argued that, one might consider. They avoid contractions, first-person asides, and rule-breaking constructions like starting a sentence with "And" or "But."
The classifier picks up on these patterns. Rewriting a few words here and there leaves the skeleton intact. That's why thesaurus-style "humanizers" don't work. The underlying structure gives the game away. Turnitin also now filters out Unicode lookalike substitutions (swapping a Latin "a" for a Cyrillic "а"), so the tricks that worked in 2023 are dead.