OpenAI confirms that AI writing detectors don’t work

Last week, OpenAI published tips for educators in a promotional blog post that shows how some teachers are using ChatGPT as an educational aid, along with suggested prompts to get started. In a related FAQ, they also officially admit what we already know: AI writing detectors don’t work, despite frequently being used to punish students with false positives.

In a section of the FAQ titled “Do AI detectors work?”, OpenAI writes, “In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.”

In July, we covered in depth why AI writing detectors such as GPTZero don’t work, with experts calling them “mostly snake oil.” These detectors often yield false positives due to relying on unproven detection metrics. Ultimately, there is nothing special about AI-written text that always distinguishes it from human-written, and detectors can be defeated by rephrasing. That same month, OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.

OpenAI’s new FAQ also addresses another big misconception, which is that ChatGPT itself can know whether text is AI-written or not. OpenAI writes, “Additionally, ChatGPT has no ‘knowledge’ of what content could be AI-generated. It will sometimes make up responses to questions like ‘did you write this [essay]?’ or ‘could this have been written by AI?’ These responses are random and have no basis in fact.”

Along those lines, OpenAI also addresses its AI models’ propensity to confabulate false information, which we have also covered in detail at Ars. “Sometimes, ChatGPT sounds convincing, but it might give you incorrect or misleading information (often called a ‘hallucination’ in the literature),” the company writes. “It can even make up things like quotes or citations, so don’t use it as your only source for research.”

(In May, a lawyer got in trouble for doing just that—citing six non-existent cases that he pulled from ChatGPT.)

Even though automated AI detectors do not work, that doesn’t mean a human can never detect AI writing. For example, a teacher familiar with a student’s typical writing style can tell when their style or capability suddenly changes. Also, some sloppy attempts to pass off AI-generated work as human-written can leave tell-tale signs, such as the phrase “as an AI language model,” which means someone copied and pasted ChatGPT output without being careful. And recently, an article in the scientific journal Nature showed how humans noticed the phrase “Regenerate response” in a scientific paper, which is the label of a button in ChatGPT.

As the technology stands today, it’s safest to avoid automated AI detection tools completely. “As of now, AI writing is undetectable and likely to remain so,” frequent AI analyst and Wharton professor Ethan Mollick told Ars in July. “AI detectors have high false positive rates, and they should not be used as a result.”

Source