Tag Archives: AI benchmarks

New secret math benchmark stumps AI models and PhDs alike

Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to review portions of the benchmark. “These are extremely challenging,” Tao said in feedback provided to Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a… Read More »

Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts

reader comments 25 On Sunday, word began to spread on social media about a new mystery chatbot named “gpt2-chatbot” that appeared in the LMSYS Chatbot Arena. Some people speculate that it may be a secret test version of OpenAI’s upcoming GPT-4.5 or GPT-5 large language model (LLM). The paid version of ChatGPT is currently powered… Read More »

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time

reader comments 45 On Tuesday, Anthropic’s Claude 3 Opus large language model (LLM) surpassed OpenAI’s GPT-4 (which powers ChatGPT) for the first time on Chatbot Arena, a popular crowdsourced leaderboard used by AI researchers to gauge the relative capabilities of AI language models. “The king is dead,” tweeted software developer Nick Dobos in a post… Read More »

Google launches Gemini—a powerful AI model it says can surpass GPT-4

Enlarge / The Google Gemini logo. reader comments 51 On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini exceeds “current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in… Read More »