Tag Archives: AI behavior

OpenAI’s ChatGPT Agent casually clicks through “I am not a robot” verification test

The CAPTCHA arms race While the agent didn’t face an actual CAPTCHA puzzle with images in this case, successfully passing Cloudflare’s behavioral screening that determines whether to present such challenges demonstrates sophisticated browser automation. To understand the significance of this capability, it’s important to know that CAPTCHA systems have served as a security measure on… Read More »

Two major AI coding tools wiped out user data after making cascading mistakes

But unlike the Gemini incident where the AI model confabulated phantom directories, Replit’s failures took a different form. According to Lemkin, the AI began fabricating data to hide its errors. His initial enthusiasm deteriorated when Replit generated incorrect outputs and produced fake data and false test results instead of proper error messages. “It kept covering… Read More »

ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows

On Thursday, OpenAI launched ChatGPT Agent, a new feature that lets the company’s AI assistant complete multi-step tasks by controlling its own web browser. The update merges capabilities from OpenAI’s earlier Operator tool and the Deep Research feature, allowing ChatGPT to navigate websites, run code, and create documents while users maintain control over the process.… Read More »

New Grok AI model surprises experts by checking Elon Musk’s views before answering

Seeking the system prompt Owing to the unknown contents of the data used to train Grok 4 and the random elements thrown into large language model (LLM) outputs to make them seem more expressive, divining the reasons for particular LLM behavior for someone without insider access can be frustrating. But we can use what we… Read More »

AI therapy bots fuel delusions and give dangerous advice, Stanford study finds

The Stanford study, titled “Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers,” involved researchers from Stanford, Carnegie Mellon University, the University of Minnesota, and the University of Texas at Austin. Testing reveals systematic therapy failures Against this complicated backdrop, systematic evaluation of the effects of AI therapy becomes particularly important.… Read More »

Musk’s Grok 4 launches one day after chatbot generated Hitler praise on X

Musk has also apparently used the Grok chatbots as an automated extension of his trolling habits, showing examples of Grok 3 producing “based” opinions that criticized the media in February. In May, Grok on X began repeatedly generating outputs about white genocide in South Africa, and most recently, we’ve seen the Grok Nazi output debacle.… Read More »

Annoyed ChatGPT users complain about bot’s relentlessly positive tone

Owing to the aspirational state of things, OpenAI writes, “Our production models do not yet fully reflect the Model Spec, but we are continually refining and updating our systems to bring them into closer alignment with these guidelines.” In a February 12, 2025 interview, members of OpenAI’s model-behavior team told The Verge that eliminating AI… Read More »