Tag Archives: AI deception

Is AI really trying to escape human control and blackmail people?

Real stakes, not science fiction While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment. Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained… Read More »

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how models trained to deliberately conceal certain motives from evaluators could still inadvertently reveal secrets, thanks to their ability to adopt different contextual roles or “personas.” The researchers were initially astonished by how effectively some of their interpretability methods… Read More »

Cops called after parents get tricked by AI-generated images of Wonka-like event

Enlarge / A photo of “Willy’s Chocolate Experience” (inset), which did not match AI-generated promises, shown in the background. reader comments 126 On Saturday, event organizers shut down a Glasgow-based “Willy’s Chocolate Experience” after customers complained that the unofficial Wonka-inspired event, which took place in a sparsely decorated venue, did not match the lush AI-generated images… Read More »