White House challenges hackers to break top AI models at DEF CON 31

Enlarge / An AI-generated image of the White House in front of a cybernetic background.

Midjourney

On Thursday, the White House announced a surprising collaboration between top AI developers, including OpenAI, Google, Antrhopic, Hugging Face, Microsoft, Nvidia, and Stability AI, to participate in a public evaluation of their generative AI systems at DEF CON 31, a hacker convention taking place in Las Vegas in August. The event will be hosted by AI Village, a community of AI hackers.

Since last year, large language models (LLMs) such as ChatGPT have become a popular way to accelerate writing and communications tasks, but officials recognize that they also come with inherent risks. Issues such as confabulations, jailbreaks, and biases pose challenges for security professionals and the public. That’s why the White House Office of Science, Technology, and Policy endorses pushing these new generative AI models to their limits.

“This independent exercise will provide critical information to researchers and the public about the impacts of these models and will enable AI companies and developers to take steps to fix issues found in those models,” says a statement from the White House, which says the event aligns with the Biden administration’s AI Bill of Rights and the National Institute of Standards and Technology’s AI Risk Management Framework.

In a parallel announcement written by AI Village, organizers Sven Cattell, Rumman Chowdhury, and Austin Carson call the upcoming event “the largest red teaming exercise ever for any group of AI models.” Thousands of people will take part in the public AI model assessment, which will utilize an evaluation platform developed by Scale AI.

“Red-teaming” is a process by which security experts attempt to find vulnerabilities or flaws in an organization’s systems to improve overall security and resilience.

According to Cattell, the founder of AI Village, “The diverse issues with these models will not be resolved until more people know how to red team and assess them.” By conducting the largest red-teaming exercise for any group of AI models, AI Village and DEF CON aim to grow the community of researchers equipped to handle vulnerabilities in AI systems.

LLMs have proven surprisingly difficult to lock down in part due to a technique called “prompt injection,” which we broke a story about in September. AI researcher Simon Willison has written in detail about the dangers of prompt injection, a technique that can derail a language model into performing actions not intended by its creator.

During the DEF CON event, participants will have timed access to multiple LLMs through laptops provided by the organizers. A capture-the-flag-style point system will encourage testing a wide range of potential harms. At the end, the person with the most points will win a high-end Nvidia GPU.

“We’ll publish what we learn from this event to help others who want to try the same thing,” writes AI Village. “The more people who know how to best work with these models, and their limitations, the better.”

DEF CON 31 will take place on August 10–13, 2023, at Caesar’s Forum in Las Vegas.

Source