Tools to detect text generated by AI They systematically fail when analyzing great literary works. The biblical Genesis, the US Constitution, ‘Harry Potter’ or ‘One Hundred Years of Solitude’ are identified by these detectors as creations of machines. The reason has a perverse logic: what algorithms interpret as AI writing is actually good writing.
Robot Bible. The tools for detect AI generated text They have been accumulating absurd verdicts for months. You just have to submit ‘One Hundred Years of Solitude’ by Gabriel García Márquez to one of these systems and you will obtain that 100% of the novel has artificial origin. The biblical Genesis or the North American Constitution do not fare better: the ZeroGPT tool rates the first text with a 88.2% chance of being AI writing and the second, as written by AI at 96.21%. Experiments with ‘Harry Potter’ or the lyrics of ‘Bohemian Rhapsody’ show similar results. The pattern is so consistent that it goes beyond the anecdote: these tools have an underlying problem.
Good bad. The irony is that AI-generated text detectors were designed to identify writing done by machines. However, they end up pointing out exactly the opposite: texts that exhibit greater stylistic care, greater internal coherence, and greater mastery of narrative rhythm are considered unlikely to have been made by humans. That is, writing well, in technical terms, is similar to writing as a language model.
How it works. To understand why this happens You also have to understand how these tools work. Most are based on two main indicators. The first is perplexity (perplexity): how predictable the choice of words in a text is. If each word follows the previous one in an expected way, perplexity is low. If the text jumps unpredictably between registers, vocabulary, and syntactic structures, perplexity is high. The second indicator is the burst (burstiness): the variation in the length of the sentences. Humans alternate long paragraphs with very short sentences, while language models tend to produce sentences of more uniform length.
A well-constructed text (precise vocabulary, clear structure, uniform rhythm) has low perplexity by design. Like García Márquez, who chooses the exact words in his texts, with almost surgeon-like precision. The Genesis has an almost hypnotic narrative cadence, deliberate, without noise, like a song with balanced meter. “Writing well” is a very complex concept, but it can mean, among other things, being predictable in the most virtuous sense: that the reader understands the text effortlessly. And that, for a detector trained in distinguishing “what a language model would do”, sets off alarm bells.
It’s the same. What complicates the problem is that generative AI models have been trained, precisely, with quality human writing. ChatGPT, Claude or Gemini produce fluent, coherent, low-perplexity texts because they learned from millions of human texts that also had those characteristics. Detecting writing done by an AI and differentiating it from good human writing is an almost impossible task for these algorithms.
Another way to fail. These criteria can take multiple forms. For example: a study on the performance of seven popular detectors when analyzing newspaper essays. TOEFL (official English exam for non-English speakers) in front of essays by American high school students. The results: 61.22% of essays written by non-native students were marked as generated by AI. In 20% of the cases, the seven detectors agreed on the erroneous diagnosis. The native student texts passed without problems.
The explanation is the same mechanics of perplexity: someone who writes in their second language uses a more limited vocabulary, simpler structures and fewer grammatical variations. It doesn’t write badly, but its tools are more limited, and AI detectors systematically penalize writers with less command of the language. The team that carried out the study recommended avoiding using these tools in evaluation contexts, especially when international students are involved. In Spain, an episode of this type took place: In 2024, the Australian Catholic University opened files to nearly 6,000 students using Turnitin, the most widespread screening platform in universities. Many of them had not used AI at any time.
Force the machine. Edward Tian, CEO of GPTZero (one of the reference detectors, with more than eight million users) openly acknowledged that many tools in the sector adjust their thresholds to intentionally generate more false positiveswith the aim of not passing through texts generated by AI even if that means wrongly pointing out a human text. Tian talks about how GPTZero fights to avoid this proliferation of false positives, but the adulteration of the results is there as a clear problem.
The last case. The publisher Hachette has just canceled the publication in the United Kingdom and the United States of ‘Shy Girl‘, a novel that the Pangram tool has detected as 78% generated by AI. The author denies having used the tool. Whatever the truth in that specific case, the episode illustrates the factual power that these tools are acquiring: they can destroy publishing contracts and put humans under suspicion before there is any definitive proof on the subject.


GIPHY App Key not set. Please check settings