Tag Archives: AI safety

OpenAI data suggests 1 million users discuss suicide with ChatGPT weekly

Earlier this month, the company unveiled a wellness council to address these concerns, though critics noted the council did not include a suicide prevention expert. OpenAI also recently rolled out controls for parents of children who use ChatGPT. The company says it’s building an age prediction system to automatically detect children using ChatGPT and impose… Read More »

Anthropic’s Claude Haiku 4.5 matches May’s frontier model at fraction of cost

And speaking of cost, Haiku 4.5 is included for subscribers of the Claude web and app plans. Through the API (for developers), the small model is priced at $1 per million input tokens and $5 per million output tokens. That compares to Sonnet 4.5 at $3 per million input and $15 per million output tokens,… Read More »

California’s newly signed AI law just gave Big Tech exactly what it wanted

On Monday, California Governor Gavin Newsom signed the Transparency in Frontier Artificial Intelligence Act into law, requiring AI companies to disclose their safety practices while stopping short of mandating actual safety testing. The law requires companies with annual revenues of at least $500 million to publish safety protocols on their websites and report incidents to… Read More »

Claude’s new AI file creation feature ships with deep security risks built in

Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.” Anthropic’s mitigations Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For… Read More »

OpenAI announces parental controls for ChatGPT after teen suicide lawsuit

On Tuesday, OpenAI announced plans to roll out parental controls for ChatGPT and route sensitive mental health conversations to its simulated reasoning models, following what the company has called “heartbreaking cases” of users experiencing crises while using the AI assistant. The moves come after multiple reported incidents where ChatGPT allegedly failed to intervene appropriately when… Read More »

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser use operated without safety mitigations. One example involved a malicious email that instructed Claude to delete a user’s emails for “mailbox hygiene” purposes. Without safeguards, Claude followed these instructions and deleted the user’s emails without… Read More »

After teen suicide, OpenAI claims it is “helping people when they need it most”

Adam Raine learned to bypass these safeguards by claiming he was writing a story—a technique the lawsuit says ChatGPT itself suggested. This vulnerability partly stems from the eased safeguards regarding fantasy roleplay and fictional scenarios implemented in February. In its Tuesday blog post, OpenAI admitted its content blocking systems have gaps where “the classifier underestimates… Read More »

Is AI really trying to escape human control and blackmail people?

Real stakes, not science fiction While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment. Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained… Read More »

ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows

On Thursday, OpenAI launched ChatGPT Agent, a new feature that lets the company’s AI assistant complete multi-step tasks by controlling its own web browser. The update merges capabilities from OpenAI’s earlier Operator tool and the Deep Research feature, allowing ChatGPT to navigate websites, run code, and create documents while users maintain control over the process.… Read More »

AI therapy bots fuel delusions and give dangerous advice, Stanford study finds

The Stanford study, titled “Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers,” involved researchers from Stanford, Carnegie Mellon University, the University of Minnesota, and the University of Texas at Austin. Testing reveals systematic therapy failures Against this complicated backdrop, systematic evaluation of the effects of AI therapy becomes particularly important.… Read More »