Tag Archives: AI security

AI models can acquire backdoors from surprisingly few malicious documents

Fine-tuning experiments with 100,000 clean samples versus 1,000 clean samples showed similar attack success rates when the number of malicious examples stayed constant. For GPT-3.5-turbo, between 50 and 90 malicious samples achieved over 80 percent attack success across dataset sizes spanning two orders of magnitude. Limitations While it may seem alarming at first that LLMs… Read More »

White House officials reportedly frustrated by Anthropic’s law enforcement AI limits

Anthropic’s AI models could potentially help spies analyze classified documents, but the company draws the line at domestic surveillance. That restriction is reportedly making the Trump administration angry. On Tuesday, Semafor reported that Anthropic faces growing hostility from the Trump administration over the AI company’s restrictions on law enforcement uses of its Claude models. Two… Read More »

Claude’s new AI file creation feature ships with deep security risks built in

Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.” Anthropic’s mitigations Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For… Read More »

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser use operated without safety mitigations. One example involved a malicious email that instructed Claude to delete a user’s emails for “mailbox hygiene” purposes. Without safeguards, Claude followed these instructions and deleted the user’s emails without… Read More »

Is AI really trying to escape human control and blackmail people?

Real stakes, not science fiction While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment. Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained… Read More »

OpenAI’s ChatGPT Agent casually clicks through “I am not a robot” verification test

The CAPTCHA arms race While the agent didn’t face an actual CAPTCHA puzzle with images in this case, successfully passing Cloudflare’s behavioral screening that determines whether to present such challenges demonstrates sophisticated browser automation. To understand the significance of this capability, it’s important to know that CAPTCHA systems have served as a security measure on… Read More »

White House unveils sweeping plan to “win” global AI race through deregulation

Trump’s plan was not welcomed by everyone. J.B. Branch, Big Tech accountability advocate for Public Citizen, in a statement provided to Ars, criticized Trump as giving “sweetheart deals” to tech companies that would cause “electricity bills to rise to subsidize discounted power for massive AI data centers.” Infrastructure demands and energy requirements Trump’s new AI… Read More »

ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows

On Thursday, OpenAI launched ChatGPT Agent, a new feature that lets the company’s AI assistant complete multi-step tasks by controlling its own web browser. The update merges capabilities from OpenAI’s earlier Operator tool and the Deep Research feature, allowing ChatGPT to navigate websites, run code, and create documents while users maintain control over the process.… Read More »

Researchers claim breakthrough in fight against AI’s frustrating security hole

To understand CaMeL, you need to understand that prompt injections happen when AI systems can’t distinguish between legitimate user commands and malicious instructions hidden in content they’re processing. Willison often says that the “original sin” of LLMs is that trusted prompts from the user and untrusted text from emails, webpages, or other sources are concatenated… Read More »

Cloudflare turns AI against itself with endless maze of irrelevant facts

On Wednesday, web infrastructure provider Cloudflare announced a new feature called “AI Labyrinth” that aims to combat unauthorized AI data scraping by serving fake AI-generated content to bots. The tool will attempt to thwart AI companies that crawl websites without permission to collect training data for large language models that power AI assistants like ChatGPT.… Read More »