Tag Archives: prompt injection

Claude’s new AI file creation feature ships with deep security risks built in

Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.” Anthropic’s mitigations Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For… Read More »

New hack uses prompt injection to corrupt Gemini’s long-term memory

[embedded content] Google Gemini: Hacking Memories with Prompt Injection and Delayed Tool Invocation. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account’s long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only… Read More »

Ars Live: Our first encounter with manipulative AI

While Bing Chat’s unhinged nature was caused in part by how Microsoft defined the “personality” of Sydney in the system prompt (and unintended side-effects of its architecture with regard to conversation length), Ars Technica’s saga with the chatbot began when someone discovered how to reveal Sydney’s instructions via prompt injection, which Ars Technica then published.… Read More »

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby”

Enlarge / A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI’s Advanced Voice Mode through the ChatGPT app. reader comments 43 OpenAI’s new Advanced Voice Mode (AVM) of its ChatGPT AI assistant rolled out to subscribers on Tuesday, and people are already finding novel ways to use it, even against OpenAI’s… Read More »

Hacker plants false memories in ChatGPT to steal user data in perpetuity

Getty Images reader comments 37 When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user’s long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern. So Rehberger did what all good… Read More »

Ban warnings fly as users dare to probe the “thoughts” of OpenAI’s latest model

reader comments 56 OpenAI truly does not want you to know what its latest AI model is “thinking.” Since the company launched its “Strawberry” AI model family last week, touting so-called reasoning abilities with o1-preview and o1-mini, OpenAI has been sending out warning emails and threats of bans to any user who tries to probe… Read More »

Dead grandma locket request tricks Bing Chat’s AI into solving security puzzle

Enlarge / The image a Bing Chat user shared to trick its AI model into solving a CAPTCHA. reader comments 70 with Bing Chat, an AI chatbot from Microsoft similar to ChatGPT, allows users to upload images for the AI model to examine or discuss. Normally, Bing Chat refuses to solve CAPTCHAs, which are visual… Read More »

AI-powered Bing Chat spills its secrets via prompt injection attack

Enlarge / With the right suggestions, researchers can “trick” a language model to spill its secrets. Aurich Lawson | Getty Images reader comments 109 with 0 posters participating Share this story On Tuesday, Microsoft revealed a “New Bing” search engine and conversational bot powered by ChatGPT-like technology from OpenAI. On Wednesday, a Stanford University student… Read More »

Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

Enlarge / A tin toy robot lying on its side. reader comments 35 with 31 posters participating Share this story On Thursday, a few Twitter users discovered how to hijack an automated tweet bot, dedicated to remote jobs, running on the GPT-3 language model by OpenAI. Using a newly discovered technique called a “prompt injection… Read More »