Tag Archives: AI benchmarks

Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts

reader comments 25 On Sunday, word began to spread on social media about a new mystery chatbot named “gpt2-chatbot” that appeared in the LMSYS Chatbot Arena. Some people speculate that it may be a secret test version of OpenAI’s upcoming GPT-4.5 or GPT-5 large language model (LLM). The paid version of ChatGPT is currently powered… Read More: Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts »

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time

reader comments 45 On Tuesday, Anthropic’s Claude 3 Opus large language model (LLM) surpassed OpenAI’s GPT-4 (which powers ChatGPT) for the first time on Chatbot Arena, a popular crowdsourced leaderboard used by AI researchers to gauge the relative capabilities of AI language models. “The king is dead,” tweeted software developer Nick Dobos in a post… Read More: “The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena… »

Google launches Gemini—a powerful AI model it says can surpass GPT-4

Enlarge / The Google Gemini logo. reader comments 51 On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini exceeds “current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in… Read More: Google launches Gemini—a powerful AI model it says can surpass… »