sleeper agents – Weekly Geek

AI poisoning could turn open models into destructive “sleeper agents,” says Anthropic

Benj Edwards | Getty Images reader comments 30 Imagine downloading an open source AI language model, and all seems well at first, but it later turns malicious. On Friday, Anthropic—the maker of ChatGPT competitor Claude—released a research paper about AI “sleeper agent” large language models (LLMs) that initially seem normal but can deceptively output vulnerable… Read More »